Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Mapjoin parameters?


Copy link to this message
-
Re: Mapjoin parameters?
Ted Yu 2010-08-20, 17:38
No.
A RowContainer will be created based on hive.mapjoin.bucket.cache.size whose
default size is 100.

See line 223 in MapJoinOperator.processOp():
        if (o == null) {
          int bucketSize = HiveConf.getIntVar(hconf,
HiveConf.ConfVars.HIVEMAPJOINBUCKETCACHESIZE);
          res = getRowContainer(hconf, (byte) tag, order[tag], bucketSize);
          res.add(value);
2010/8/19 Ted Xu <[EMAIL PROTECTED]>

> Thanks John, I'll create an issue for that.
>
> PS: So in mapjoin only the first 25000 rows in the small table will be
> cached by default, I'm I right? If the small table is more than 25000 rows,
> we will miss certain proportion of data without any warning or exception?
>
> 在 2010年8月20日 上午4:56,John Sichi <[EMAIL PROTECTED]>写道:
>
> For hive.mapjoin.cache.numrows, I found this in hive/conf/hive-default.xml:
>>
>> <property>
>>   <name>hive.mapjoin.cache.numrows</name>
>>   <value>25000</value>
>>   <description>How many rows should be cached by jdbm for map join.
>> </description>
>> </property>
>>
>> hive.mapjoin.size is missing from hive-default.xml; can you create a JIRA
>> issue for that?
>>
>> JVS
>>
>> On Aug 19, 2010, at 1:07 AM, Ted Xu wrote:
>>
>> Hi all,
>>
>> I found 2 parameters which have something to do with mapjoin, that is :
>>
>> hive.mapjoin.cache.numrows
>> hive.mapjoin.size.key
>>
>> I can't find any formal document on that 2 parameters.
>>
>> I guess "hive.mapjoin.cache.numrows" sets the maximum row count of the
>> small table in map join, and rows more than that setting will be disposed.
>> Once I use map join with a 50000+ rows table, some records can't be joined,
>> and I fixed the problem by increasing "hive.mapjoin.cache.numrows".
>>
>> However, sometimes I still get OOM exception even if the "
>> hive.mapjoin.cache.numrows" parameter is not set (by default, 25000 I
>> guess).
>>
>> Please explain me the usage of the parameters if you know, thanks.
>>
>> --
>> Best Regards,
>> Ted Xu
>>
>>
>>
>
>
> --
> Best Regards,
> Ted Xu
>