The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise.
I think that Carlos Guestrin is from CMU (as the graphlab page here - http://graphlab.org/contact/ also tells us), whereas you have said he is from University of Washington.
Could you clarify this please?
Although he is still affiliated with CMU, Guestrin recently moved to the University of Washington:
Gregory, very good article!
I would like to add that there is another free and open source distributed data-intensive computing platform, which is not based on the MapReduce paradigm: the LexisNexis HPCC Systems platform (http://hpccsystems.com).
The original design for the HPCC Systems platform predates the paper on MapReduce from the Google researchers by, at least, 5 years. The processing model of the HPCC Systems platform is dataflow oriented and provides a very high level declarative and open programming language called ECL, which offers modern programming language features, including code/data encapsulation, lazy evaluation, compilation to native code and purity. This platform underpins all the data services and analytic products from LexisNexis Risk Solutions, and several other information products from Reed Elsevier, its parent company, in areas that cover machine learning, massive data warehousing, social graph analytics, recommendation systems, etc. It has also been in use by several large and medium sized Organizations for years (even before it was released under an Open Source license, back in 2011).
On the same topic, a few weeks ago, I wrote a short blog post comparing the paradigms behind the two main data-intensive open source platforms: Hadoop and HPCC, which you can read here: http://hpccsystems.com/blog/hpcc-systems-hadoop-%E2%80%93-contrast-paradigms. Some of the concepts that I expose there are relevant for this article.
Displaying all 3 comments