Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Re: Map join optimization issue


Copy link to this message
-
Re: Map join optimization issue
Mayuresh Kunjir 2013-02-15, 21:57
I am on 0.9.

If I have a selectivity condition on small table, does Hive try to estimate
filtered data size before deciding the join algorithm? If it is the case,
it makes sense to use map join even when the small table(before filter) is
larger than the hive.mapjoin.smalltable.filesize parameter. Any ideas?

~Mayuresh

On Fri, Feb 15, 2013 at 4:05 PM, Aniket Mokashi <[EMAIL PROTECTED]> wrote:

> I have tested that the parameter  hive.mapjoin.smalltable.filesize works
> well with 0.8. What version of hive are you on?
>
>
> On Fri, Feb 15, 2013 at 8:57 AM, <[EMAIL PROTECTED]> wrote:
>
>> **
>> Hi
>>
>> In later versions of hive you actually don't need a map joint hint in
>> your query. Just the following would suffice the purpose
>>
>> Set hive.auto.convert.join=true
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Mayuresh Kunjir <[EMAIL PROTECTED]>
>> *Date: *Fri, 15 Feb 2013 10:37:52 -0500
>> *To: *user<[EMAIL PROTECTED]>
>> *ReplyTo: * [EMAIL PROTECTED]
>> *Subject: *Re: Map join optimization issue
>>
>> Thanks Aniket. I actually had not specified the map-join hint though.
>> Sorry for providing the wrong information earlier. I had only
>> set hive.auto.convert.join=true before firing my join query.
>>
>> ~Mayuresh
>>
>>
>>
>> On Thu, Feb 14, 2013 at 10:44 PM, Aniket Mokashi <[EMAIL PROTECTED]>wrote:
>>
>>> I think hive.mapjoin.smalltable.filesize parameter will be disregarded
>>> in that case.
>>>
>>>
>>> On Thu, Feb 14, 2013 at 7:25 AM, Mayuresh Kunjir <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Yes, the hint was specified.
>>>> On Feb 14, 2013 3:11 AM, "Aniket Mokashi" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> have you specified map-join hint in your query?
>>>>>
>>>>>
>>>>> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>>
>>>>>> I am trying to join two tables, the smaller being of size 4GB. When I
>>>>>> set hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to
>>>>>> perform a local task to read the smaller file. This of-course fails since
>>>>>> the file size is greater and the backup common join is then run. What I do
>>>>>> not understand is why did Hive attempt a map join when small file size was
>>>>>> greater than the smalltable.filesize parameter.
>>>>>>
>>>>>>
>>>>>> ~Mayuresh
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "...:::Aniket:::... Quetzalco@tl"
>>>>>
>>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>>
>>
>>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>