Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Pig vs hive performance


Copy link to this message
-
Re: Pig vs hive performance
Hi Abhishek,

I've no idea with the optimizer. In my opinion, SQL
like programming language is hard to optimize, hive may slower than pig in
many cases. But on the earth, for every hadoop job, there must be a
best(time or space) sequence of map/reduce phases. you should rewrite your
pig/hive script following something like:
http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html fits the
optimizer well then it might generate the best sequence.

my suggestion is, leave hive as a data warehouse, and do most jobs in pig.
As you asked before"1) what hive is good at?", if you have a complex join
written in SQL, you can directly apply it on hive, but it will take you lot
of time to translate it to pig script.

Thanks,
TianYi

On Thu, Oct 4, 2012 at 9:41 AM, Abhishek <[EMAIL PROTECTED]> wrote:

> Hi Zhu,
>
> Thanks for the reply.I am running some querys where is slower than pig.
>
> I was also thinking that pig optimizer is better than hive optimizer.
>
> Regards
> Abhi
>
> Sent from my iPhone
>
> On Oct 3, 2012, at 7:15 PM, TianYi Zhu <[EMAIL PROTECTED]>
> wrote:
>
> > from amazon web site:
> > http://aws.amazon.com/elasticmapreduce/faqs/#hive-8
> >
> >
> > Q: When should I use Hive vs. PIG?
> >
> > Hive and PIG both provide high level data-processing languages with
> support
> > for complex data types for operating on large datasets. The Hive language
> > is a variant of SQL and so is more accessible to people already familiar
> > with SQL and relational databases. Hive has support for partitioned
> tables
> > which allow Amazon Elastic MapReduce job flows to pull down only the
> table
> > partition relevant to the query being executed rather than doing a full
> > table scan. Both PIG and Hive have query plan optimization. PIG is able
> to
> > optimize across an entire scripts while Hive queries are optimized at the
> > statement level.
> >
> > Ultimately the choice of whether to use Hive or PIG will depend on the
> > exact requirements of the application domain and the preferences of the
> > implementers and those writing queries.
> >
> >
> > On Thu, Oct 4, 2012 at 7:52 AM, Abhishek <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi all,
> >>
> >> Can we discuss performance of pig vs hive
> >>
> >> 1) what hive is good at?
> >> 2) what pig is good at?
> >> 3) Hive optimizer vs pig optimizer
> >> 4) hive limitations vs pig limitations
> >>
> >> Regards
> >> Abhi
> >>
> >> Sent from my iPhone
> >>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB