Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: OPTIMIZING A HIVE QUERY


Copy link to this message
-
Re: OPTIMIZING A HIVE QUERY
> My question was every join in a hive query would constitute to a
Mapreduce job.
In the general case, yes. BUT if one side of your join is small enough (ie
you can keep all in memory), a hash join/map join can be performed which is
much more performant (no reduce is required).

Bejoy KS has just provided the right link.

> Store data in the smarter way? can you please elaborate on this.
That's not Hive related. The same logic applies to RDMS. You want to keep a
normalized source of data but sometimes 'unnomarlizing' it can greatly
improves your performance. That's one of the advantage of document store.
It is very dependent on your use cases.

Bertrand

On Tue, Aug 14, 2012 at 7:30 PM, sudeep tokala <[EMAIL PROTECTED]>wrote:

> hi Bertrand,
>
> Thanks for the reply.
>
> My question was every join in a hive query would constitute to a Mapreduce
> job.
> Mapreduce job goes through serialization and deserilaization of objects
> Isnt it a overhead.
>
> Store data in the smarter way? can you please elaborate on this.
>
> Regards
> Sudeep
>
> On Tue, Aug 14, 2012 at 11:39 AM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>
>> You may want to be clearer. Is your question : how can I change the
>> serialization strategy of Hive? (If so I let other users answer and I am
>> also interested in the answer.)
>>
>> Else the answer is simple. If you want to join data which can not be
>> stored into memory, you need to serialize them. The only solution is to
>> store the data in a smarter way which would not require you to do the join.
>> By the way, how do you know the serialisation is the bottleneck?
>>
>> Bertrand
>>
>>
>> On Tue, Aug 14, 2012 at 5:11 PM, sudeep tokala <[EMAIL PROTECTED]>wrote:
>>
>>>
>>>
>>> On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi all,
>>>>
>>>> How to avoid serialization and deserialization overhead in hive join
>>>> query ? will this optimize my query performance.
>>>>
>>>> Regards
>>>> sudeep
>>>>
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
--
Bertrand Dechoux