Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Mapjoin parameters?


Copy link to this message
-
Re: Mapjoin parameters?
No.
A RowContainer will be created based on hive.mapjoin.bucket.cache.size whose
default size is 100.

See line 223 in MapJoinOperator.processOp():
        if (o == null) {
          int bucketSize = HiveConf.getIntVar(hconf,
HiveConf.ConfVars.HIVEMAPJOINBUCKETCACHESIZE);
          res = getRowContainer(hconf, (byte) tag, order[tag], bucketSize);
          res.add(value);
2010/8/19 Ted Xu <[EMAIL PROTECTED]>

> Thanks John, I'll create an issue for that.
>
> PS: So in mapjoin only the first 25000 rows in the small table will be
> cached by default, I'm I right? If the small table is more than 25000 rows,
> we will miss certain proportion of data without any warning or exception?
>
> 在 2010年8月20日 上午4:56,John Sichi <[EMAIL PROTECTED]>写道:
>
> For hive.mapjoin.cache.numrows, I found this in hive/conf/hive-default.xml:
>>
>> <property>
>>   <name>hive.mapjoin.cache.numrows</name>
>>   <value>25000</value>
>>   <description>How many rows should be cached by jdbm for map join.
>> </description>
>> </property>
>>
>> hive.mapjoin.size is missing from hive-default.xml; can you create a JIRA
>> issue for that?
>>
>> JVS
>>
>> On Aug 19, 2010, at 1:07 AM, Ted Xu wrote:
>>
>> Hi all,
>>
>> I found 2 parameters which have something to do with mapjoin, that is :
>>
>> hive.mapjoin.cache.numrows
>> hive.mapjoin.size.key
>>
>> I can't find any formal document on that 2 parameters.
>>
>> I guess "hive.mapjoin.cache.numrows" sets the maximum row count of the
>> small table in map join, and rows more than that setting will be disposed.
>> Once I use map join with a 50000+ rows table, some records can't be joined,
>> and I fixed the problem by increasing "hive.mapjoin.cache.numrows".
>>
>> However, sometimes I still get OOM exception even if the "
>> hive.mapjoin.cache.numrows" parameter is not set (by default, 25000 I
>> guess).
>>
>> Please explain me the usage of the parameters if you know, thanks.
>>
>> --
>> Best Regards,
>> Ted Xu
>>
>>
>>
>
>
> --
> Best Regards,
> Ted Xu
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB