Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Pig vs hive performance


+
Abhishek 2012-10-03, 21:52
+
Dan Richelson 2012-10-04, 02:50
+
abhishek dodda 2012-10-04, 03:42
+
TianYi Zhu 2012-10-03, 23:15
+
Abhishek 2012-10-03, 23:41
+
TianYi Zhu 2012-10-04, 00:14
Copy link to this message
-
Re: Pig vs hive performance
Thanks Zhu for your reply, your points makes sense to me.

Regards
Abhishek

On Oct 3, 2012, at 8:14 PM, TianYi Zhu <[EMAIL PROTECTED]> wrote:

> Hi Abhishek,
>
> I've no idea with the optimizer. In my opinion, SQL
> like programming language is hard to optimize, hive may slower than pig in
> many cases. But on the earth, for every hadoop job, there must be a
> best(time or space) sequence of map/reduce phases. you should rewrite your
> pig/hive script following something like:
> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html fits the
> optimizer well then it might generate the best sequence.
>
> my suggestion is, leave hive as a data warehouse, and do most jobs in pig.
> As you asked before"1) what hive is good at?", if you have a complex join
> written in SQL, you can directly apply it on hive, but it will take you lot
> of time to translate it to pig script.
>
> Thanks,
> TianYi
>
> On Thu, Oct 4, 2012 at 9:41 AM, Abhishek <[EMAIL PROTECTED]> wrote:
>
>> Hi Zhu,
>>
>> Thanks for the reply.I am running some querys where is slower than pig.
>>
>> I was also thinking that pig optimizer is better than hive optimizer.
>>
>> Regards
>> Abhi
>>
>> Sent from my iPhone
>>
>> On Oct 3, 2012, at 7:15 PM, TianYi Zhu <[EMAIL PROTECTED]>
>> wrote:
>>
>>> from amazon web site:
>>> http://aws.amazon.com/elasticmapreduce/faqs/#hive-8
>>>
>>>
>>> Q: When should I use Hive vs. PIG?
>>>
>>> Hive and PIG both provide high level data-processing languages with
>> support
>>> for complex data types for operating on large datasets. The Hive language
>>> is a variant of SQL and so is more accessible to people already familiar
>>> with SQL and relational databases. Hive has support for partitioned
>> tables
>>> which allow Amazon Elastic MapReduce job flows to pull down only the
>> table
>>> partition relevant to the query being executed rather than doing a full
>>> table scan. Both PIG and Hive have query plan optimization. PIG is able
>> to
>>> optimize across an entire scripts while Hive queries are optimized at the
>>> statement level.
>>>
>>> Ultimately the choice of whether to use Hive or PIG will depend on the
>>> exact requirements of the application domain and the preferences of the
>>> implementers and those writing queries.
>>>
>>>
>>> On Thu, Oct 4, 2012 at 7:52 AM, Abhishek <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Can we discuss performance of pig vs hive
>>>>
>>>> 1) what hive is good at?
>>>> 2) what pig is good at?
>>>> 3) Hive optimizer vs pig optimizer
>>>> 4) hive limitations vs pig limitations
>>>>
>>>> Regards
>>>> Abhi
>>>>
>>>> Sent from my iPhone
>>