-Re: Hive Vs Pig: Master's thesis
Edward Capriolo 2014-05-03, 17:57
These days pig and hive are designed to play together more, it is not a VS
thing. Benchmarks I barely read them any more. They are typically done by a
few types of entities:
1) vendors that clearly have something to gain by presenting one side of
2) people not familiar with the intricacies of either project and do not
usually effectively figure out how to use one or both of the projects
3) In depth analysis that prove temporal results (pig is faster then hive
at X) and with a different data set the opposite is true or after 3 months
both codes bases have changed significantly and the analysis would need to
be redone (but others continue using the result for years as if it were
some permanent fact)
I would think an approach like this is more interesting.
The design: Hive is SQL-like, a declarative language. Pig while still being
declarative is more imperative. User has to deal with flow. For example: is
where clause/ filter done before the group or after? What are the benefits
of one vs the other? If the same transformation like group, count with
where clause is a 8 line pig script vs a 1 line hive query are their cases
where that is better and worse?
How does the system support plugins, ie, is there support to get data from
mongo or who knows access/ excel? What about user functions to trim data or
reshape xml? etc. What are the pluggable points of both systems?
On Sat, May 3, 2014 at 1:12 PM, Sarfraz Ramay <[EMAIL PROTECTED]>wrote: