I'd like to introduce you Pangool <http://pangool.net/>, an easier
low-level MapReduce API for Hadoop. I'm one of the developers. We just
open-sourced it yesterday.
Pangool is a Java, low-level MapReduce API with the same flexibility and
performance than the plain Java Hadoop MapReduce API. The difference is
that it makes a lot of things easier to code and understand.
A few of Pangool's features:
- Tuple-based intermediate serialization (allowing easier development).
- Built-in, easy-to-use group by and sort by (removing boilerplate code for
things like secondary sort).
- Built-in, easy-to-use reduce-side joins (which are quite hard to
implement in Hadoop).
- Augmented Hadoop API: Built-in multiple inputs / outputs, configuration
via object instance.
Pangool meets the need of making Hadoop's steep learning curve a lot
smoother while retaining all its features, power and flexibility. It
differs in high-level tools like Pig or Hive in that it can be used as a
replacement of the low-level API. There is no performance / flexibility
penalty paid for using Pangool.
We did an initial benchmark <http://pangool.net/benchmark.html> to show
I'd be very interested in hearing your feedback, opinions and questions on