|
Abhishek
2012-10-03, 21:52
Dan Richelson
2012-10-04, 02:50
abhishek dodda
2012-10-04, 03:42
TianYi Zhu
2012-10-03, 23:15
Abhishek
2012-10-03, 23:41
TianYi Zhu
2012-10-04, 00:14
Abhishek
2012-10-04, 03:30
|
-
Pig vs hive performanceAbhishek 2012-10-03, 21:52
Hi all,
Can we discuss performance of pig vs hive 1) what hive is good at? 2) what pig is good at? 3) Hive optimizer vs pig optimizer 4) hive limitations vs pig limitations Regards Abhi Sent from my iPhone +
Abhishek 2012-10-03, 21:52
-
Re: Pig vs hive performanceDan Richelson 2012-10-04, 02:50
Anecdotally I can say that Pig seems to scale down better than Hive.
We see this in tests- hive scripts running small amounts of data take much longer than similar Pig scripts. Hive parallel settings are enabled. I think this has to do with the fact that there doesn't seem to be a 'local' mode for hive- you have to run it as mapreduce jobs (either embedded or on a cluster). Please correct me if I am wrong here. On Wed, Oct 3, 2012 at 3:52 PM, Abhishek <[EMAIL PROTECTED]> wrote: > Hi all, > > Can we discuss performance of pig vs hive > > 1) what hive is good at? > 2) what pig is good at? > 3) Hive optimizer vs pig optimizer > 4) hive limitations vs pig limitations > > Regards > Abhi > > Sent from my iPhone > > -- > > > -- Dan Richelson, Software Engineer Tendril 2560 55th St. | Boulder, Colorado 80301 M 303-709-2214 www.tendrilinc.com This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. +
Dan Richelson 2012-10-04, 02:50
-
Re: Pig vs hive performanceabhishek dodda 2012-10-04, 03:42
On Wed, Oct 3, 2012 at 7:50 PM, Dan Richelson <[EMAIL PROTECTED]> wrote:
> Anecdotally I can say that Pig seems to scale down better than Hive. > We see this in tests- hive scripts running small amounts of data take > much longer than similar Pig scripts. Hive parallel settings are > enabled. -- Same as in our case, for the small data pig seems to be much faster than hive. I think this has to do with the fact that there doesn't seem > to be a 'local' mode for hive- you have to run it as mapreduce jobs > (either embedded or on a cluster). Please correct me if I am wrong > here. -- I am not very sure whether this makes difference ? Regards Abhi > > > > On Wed, Oct 3, 2012 at 3:52 PM, Abhishek <[EMAIL PROTECTED]> wrote: >> Hi all, >> >> Can we discuss performance of pig vs hive >> >> 1) what hive is good at? >> 2) what pig is good at? >> 3) Hive optimizer vs pig optimizer >> 4) hive limitations vs pig limitations >> >> Regards >> Abhi >> >> Sent from my iPhone >> >> -- >> >> >> > > > > -- > Dan Richelson, Software Engineer > > Tendril > 2560 55th St. | Boulder, Colorado 80301 > M 303-709-2214 > www.tendrilinc.com > > This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. > If you have received this email in error please notify the sender. > Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. > Finally, the recipient should check this email and any attachments for the presence of viruses. > The company accepts no liability for any damage caused by any virus transmitted by this email. > > -- > > > +
abhishek dodda 2012-10-04, 03:42
-
Re: Pig vs hive performanceTianYi Zhu 2012-10-03, 23:15
from amazon web site:
http://aws.amazon.com/elasticmapreduce/faqs/#hive-8 Q: When should I use Hive vs. PIG? Hive and PIG both provide high level data-processing languages with support for complex data types for operating on large datasets. The Hive language is a variant of SQL and so is more accessible to people already familiar with SQL and relational databases. Hive has support for partitioned tables which allow Amazon Elastic MapReduce job flows to pull down only the table partition relevant to the query being executed rather than doing a full table scan. Both PIG and Hive have query plan optimization. PIG is able to optimize across an entire scripts while Hive queries are optimized at the statement level. Ultimately the choice of whether to use Hive or PIG will depend on the exact requirements of the application domain and the preferences of the implementers and those writing queries. On Thu, Oct 4, 2012 at 7:52 AM, Abhishek <[EMAIL PROTECTED]> wrote: > Hi all, > > Can we discuss performance of pig vs hive > > 1) what hive is good at? > 2) what pig is good at? > 3) Hive optimizer vs pig optimizer > 4) hive limitations vs pig limitations > > Regards > Abhi > > Sent from my iPhone > +
TianYi Zhu 2012-10-03, 23:15
-
Re: Pig vs hive performanceAbhishek 2012-10-03, 23:41
Hi Zhu,
Thanks for the reply.I am running some querys where is slower than pig. I was also thinking that pig optimizer is better than hive optimizer. Regards Abhi Sent from my iPhone On Oct 3, 2012, at 7:15 PM, TianYi Zhu <[EMAIL PROTECTED]> wrote: > from amazon web site: > http://aws.amazon.com/elasticmapreduce/faqs/#hive-8 > > > Q: When should I use Hive vs. PIG? > > Hive and PIG both provide high level data-processing languages with support > for complex data types for operating on large datasets. The Hive language > is a variant of SQL and so is more accessible to people already familiar > with SQL and relational databases. Hive has support for partitioned tables > which allow Amazon Elastic MapReduce job flows to pull down only the table > partition relevant to the query being executed rather than doing a full > table scan. Both PIG and Hive have query plan optimization. PIG is able to > optimize across an entire scripts while Hive queries are optimized at the > statement level. > > Ultimately the choice of whether to use Hive or PIG will depend on the > exact requirements of the application domain and the preferences of the > implementers and those writing queries. > > > On Thu, Oct 4, 2012 at 7:52 AM, Abhishek <[EMAIL PROTECTED]> wrote: > >> Hi all, >> >> Can we discuss performance of pig vs hive >> >> 1) what hive is good at? >> 2) what pig is good at? >> 3) Hive optimizer vs pig optimizer >> 4) hive limitations vs pig limitations >> >> Regards >> Abhi >> >> Sent from my iPhone >> +
Abhishek 2012-10-03, 23:41
-
Re: Pig vs hive performanceTianYi Zhu 2012-10-04, 00:14
Hi Abhishek,
I've no idea with the optimizer. In my opinion, SQL like programming language is hard to optimize, hive may slower than pig in many cases. But on the earth, for every hadoop job, there must be a best(time or space) sequence of map/reduce phases. you should rewrite your pig/hive script following something like: http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html fits the optimizer well then it might generate the best sequence. my suggestion is, leave hive as a data warehouse, and do most jobs in pig. As you asked before"1) what hive is good at?", if you have a complex join written in SQL, you can directly apply it on hive, but it will take you lot of time to translate it to pig script. Thanks, TianYi On Thu, Oct 4, 2012 at 9:41 AM, Abhishek <[EMAIL PROTECTED]> wrote: > Hi Zhu, > > Thanks for the reply.I am running some querys where is slower than pig. > > I was also thinking that pig optimizer is better than hive optimizer. > > Regards > Abhi > > Sent from my iPhone > > On Oct 3, 2012, at 7:15 PM, TianYi Zhu <[EMAIL PROTECTED]> > wrote: > > > from amazon web site: > > http://aws.amazon.com/elasticmapreduce/faqs/#hive-8 > > > > > > Q: When should I use Hive vs. PIG? > > > > Hive and PIG both provide high level data-processing languages with > support > > for complex data types for operating on large datasets. The Hive language > > is a variant of SQL and so is more accessible to people already familiar > > with SQL and relational databases. Hive has support for partitioned > tables > > which allow Amazon Elastic MapReduce job flows to pull down only the > table > > partition relevant to the query being executed rather than doing a full > > table scan. Both PIG and Hive have query plan optimization. PIG is able > to > > optimize across an entire scripts while Hive queries are optimized at the > > statement level. > > > > Ultimately the choice of whether to use Hive or PIG will depend on the > > exact requirements of the application domain and the preferences of the > > implementers and those writing queries. > > > > > > On Thu, Oct 4, 2012 at 7:52 AM, Abhishek <[EMAIL PROTECTED]> > wrote: > > > >> Hi all, > >> > >> Can we discuss performance of pig vs hive > >> > >> 1) what hive is good at? > >> 2) what pig is good at? > >> 3) Hive optimizer vs pig optimizer > >> 4) hive limitations vs pig limitations > >> > >> Regards > >> Abhi > >> > >> Sent from my iPhone > >> > +
TianYi Zhu 2012-10-04, 00:14
-
Re: Pig vs hive performanceAbhishek 2012-10-04, 03:30
Thanks Zhu for your reply, your points makes sense to me.
Regards Abhishek On Oct 3, 2012, at 8:14 PM, TianYi Zhu <[EMAIL PROTECTED]> wrote: > Hi Abhishek, > > I've no idea with the optimizer. In my opinion, SQL > like programming language is hard to optimize, hive may slower than pig in > many cases. But on the earth, for every hadoop job, there must be a > best(time or space) sequence of map/reduce phases. you should rewrite your > pig/hive script following something like: > http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html fits the > optimizer well then it might generate the best sequence. > > my suggestion is, leave hive as a data warehouse, and do most jobs in pig. > As you asked before"1) what hive is good at?", if you have a complex join > written in SQL, you can directly apply it on hive, but it will take you lot > of time to translate it to pig script. > > Thanks, > TianYi > > On Thu, Oct 4, 2012 at 9:41 AM, Abhishek <[EMAIL PROTECTED]> wrote: > >> Hi Zhu, >> >> Thanks for the reply.I am running some querys where is slower than pig. >> >> I was also thinking that pig optimizer is better than hive optimizer. >> >> Regards >> Abhi >> >> Sent from my iPhone >> >> On Oct 3, 2012, at 7:15 PM, TianYi Zhu <[EMAIL PROTECTED]> >> wrote: >> >>> from amazon web site: >>> http://aws.amazon.com/elasticmapreduce/faqs/#hive-8 >>> >>> >>> Q: When should I use Hive vs. PIG? >>> >>> Hive and PIG both provide high level data-processing languages with >> support >>> for complex data types for operating on large datasets. The Hive language >>> is a variant of SQL and so is more accessible to people already familiar >>> with SQL and relational databases. Hive has support for partitioned >> tables >>> which allow Amazon Elastic MapReduce job flows to pull down only the >> table >>> partition relevant to the query being executed rather than doing a full >>> table scan. Both PIG and Hive have query plan optimization. PIG is able >> to >>> optimize across an entire scripts while Hive queries are optimized at the >>> statement level. >>> >>> Ultimately the choice of whether to use Hive or PIG will depend on the >>> exact requirements of the application domain and the preferences of the >>> implementers and those writing queries. >>> >>> >>> On Thu, Oct 4, 2012 at 7:52 AM, Abhishek <[EMAIL PROTECTED]> >> wrote: >>> >>>> Hi all, >>>> >>>> Can we discuss performance of pig vs hive >>>> >>>> 1) what hive is good at? >>>> 2) what pig is good at? >>>> 3) Hive optimizer vs pig optimizer >>>> 4) hive limitations vs pig limitations >>>> >>>> Regards >>>> Abhi >>>> >>>> Sent from my iPhone >> +
Abhishek 2012-10-04, 03:30
|