Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Pangool: easier Hadoop, same performance

Copy link to this message
Pangool: easier Hadoop, same performance
Pere Ferrera 2012-03-06, 10:29
I'd like to introduce you Pangool <http://pangool.net/>, an easier
low-level MapReduce API for Hadoop. I'm one of the developers. We just
open-sourced it yesterday.

Pangool is a Java, low-level MapReduce API with the same flexibility and
performance than the plain Java Hadoop MapReduce API. The difference is
that it makes a lot of things easier to code and understand.

A few of Pangool's features:
- Tuple-based intermediate serialization (allowing easier development).
- Built-in, easy-to-use group by and sort by (removing boilerplate code for
things like secondary sort).
- Built-in, easy-to-use reduce-side joins (which are quite hard to
implement in Hadoop).
- Augmented Hadoop API: Built-in multiple inputs / outputs, configuration
via object instance.

Pangool meets the need of making Hadoop's steep learning curve a lot
smoother while retaining all its features, power and flexibility. It
differs in high-level tools like Pig or Hive in that it can be used as a
replacement of the low-level API. There is no performance / flexibility
penalty paid for using Pangool.

We did an initial benchmark <http://pangool.net/benchmark.html> to show
this idea.

I'd be very interested in hearing your feedback, opinions and questions on