Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> OutOfMemory when doing map-side join


+
Min Zhou 2009-06-15, 04:23
+
Namit Jain 2009-06-15, 04:51
+
Min Zhou 2009-06-15, 05:06
+
Namit Jain 2009-06-15, 05:14
+
Min Zhou 2009-06-15, 05:20
+
Namit Jain 2009-06-15, 05:52
+
Min Zhou 2009-06-15, 05:59
+
Namit Jain 2009-06-15, 06:02
+
Min Zhou 2009-06-15, 06:02
+
Namit Jain 2009-06-15, 22:52
+
Namit Jain 2009-06-15, 22:59
+
Min Zhou 2009-06-16, 02:15
+
Ashish Thusoo 2009-06-17, 22:10
Copy link to this message
-
Re: OutOfMemory when doing map-side join
hmm, that is a 100KB per my math.

20K * 100K = 2GB

-- amr

Ashish Thusoo wrote:
> That does not sound right. Each row is 100MB - that sounds too much...
>  
> Ashish
>
> ------------------------------------------------------------------------
> *From:* Min Zhou [mailto:[EMAIL PROTECTED]]
> *Sent:* Monday, June 15, 2009 7:16 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: OutOfMemory when doing map-side join
>
> 20k rows need 2G memory?  so terrible.  The whole small table of mine
> is less than 4MB,   what about yours?
>
> On Tue, Jun 16, 2009 at 6:59 AM, Namit Jain <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Set  mapred.child.java.opts to increase mapper memory.
>
>      
>
>      
>
>      
>
>      
>
>     *From:* Namit Jain [mailto:[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>]
>     *Sent:* Monday, June 15, 2009 3:53 PM
>
>     *To:* [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>     *Subject:* RE: OutOfMemory when doing map-side join
>
>      
>
>     There are multiple things going on.
>
>      
>
>     Column pruning is not working with map-joins. It is being tracked at:
>
>      
>
>     https://issues.apache.org/jira/browse/HIVE-560
>
>      
>
>      
>
>     Also, since it is a Cartesian product, jdbm does not help  -
>     because a single key can be very large.
>
>      
>
>      
>
>     For now, you can do the column pruning yourself -- create a new
>     table with only the columns needed and then
>
>     join with the bigger table.
>
>      
>
>     You may still need to increase the mapper memory -  I was able to
>     load about 20k rows with about 2G mapper.
>
>      
>
>      
>
>      
>
>      
>
>      
>
>      
>
>      
>
>     *From:* Min Zhou [mailto:[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>]
>     *Sent:* Sunday, June 14, 2009 11:02 PM
>     *To:* [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>     *Subject:* Re: OutOfMemory when doing map-side join
>
>      
>
>     btw, that small table 'application' has only one partition right
>     now,  20k rows.
>
>     On Mon, Jun 15, 2009 at 1:59 PM, Min Zhou <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>
>     failed with null pointer exception.
>     hive>select /*+ MAPJOIN(a) */ a.url_pattern, w.url from  (select
>     x.url_pattern from application x where x.dt = '20090609') a join
>     web_log w where w.logdate='20090611' and w.url rlike a.url_pattern;
>     FAILED: Unknown exception : null
>
>
>     $cat /tmp/hive/hive.log | tail...
>
>     2009-06-15 13:57:02,933 ERROR ql.Driver
>     (SessionState.java:printError(279)) - FAILED: Unknown exception : null
>     java.lang.NullPointerException
>             at
>     org.apache.hadoop.hive.ql.parse.QBMetaData.getTableForAlias(QBMetaData.java:76)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.getTableColumnDesc(PartitionPruner.java:284)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:217)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.addExpression(PartitionPruner.java:377)
>             at
>     org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPartitionPruners(SemanticAnalyzer.java:608)
>             at
>     org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:3785)
>             at
>     org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
>             at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:177)
>             at org.apache.hadoop.hive.ql.Driver.run(Driver.java:209)
+
Min Zhou 2009-06-18, 01:25
+
Ashish Thusoo 2009-06-18, 01:32
+
Min Zhou 2009-06-18, 01:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB