Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> OutOfMemory when doing map-side join


Copy link to this message
-
Re: OutOfMemory when doing map-side join
hmm, that is a 100KB per my math.

20K * 100K = 2GB

-- amr

Ashish Thusoo wrote:
> That does not sound right. Each row is 100MB - that sounds too much...
>  
> Ashish
>
> ------------------------------------------------------------------------
> *From:* Min Zhou [mailto:[EMAIL PROTECTED]]
> *Sent:* Monday, June 15, 2009 7:16 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: OutOfMemory when doing map-side join
>
> 20k rows need 2G memory?  so terrible.  The whole small table of mine
> is less than 4MB,   what about yours?
>
> On Tue, Jun 16, 2009 at 6:59 AM, Namit Jain <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Set  mapred.child.java.opts to increase mapper memory.
>
>      
>
>      
>
>      
>
>      
>
>     *From:* Namit Jain [mailto:[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>]
>     *Sent:* Monday, June 15, 2009 3:53 PM
>
>     *To:* [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>     *Subject:* RE: OutOfMemory when doing map-side join
>
>      
>
>     There are multiple things going on.
>
>      
>
>     Column pruning is not working with map-joins. It is being tracked at:
>
>      
>
>     https://issues.apache.org/jira/browse/HIVE-560
>
>      
>
>      
>
>     Also, since it is a Cartesian product, jdbm does not help  -
>     because a single key can be very large.
>
>      
>
>      
>
>     For now, you can do the column pruning yourself -- create a new
>     table with only the columns needed and then
>
>     join with the bigger table.
>
>      
>
>     You may still need to increase the mapper memory -  I was able to
>     load about 20k rows with about 2G mapper.
>
>      
>
>      
>
>      
>
>      
>
>      
>
>      
>
>      
>
>     *From:* Min Zhou [mailto:[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>]
>     *Sent:* Sunday, June 14, 2009 11:02 PM
>     *To:* [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>     *Subject:* Re: OutOfMemory when doing map-side join
>
>      
>
>     btw, that small table 'application' has only one partition right
>     now,  20k rows.
>
>     On Mon, Jun 15, 2009 at 1:59 PM, Min Zhou <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>
>     failed with null pointer exception.
>     hive>select /*+ MAPJOIN(a) */ a.url_pattern, w.url from  (select
>     x.url_pattern from application x where x.dt = '20090609') a join
>     web_log w where w.logdate='20090611' and w.url rlike a.url_pattern;
>     FAILED: Unknown exception : null
>
>
>     $cat /tmp/hive/hive.log | tail...
>
>     2009-06-15 13:57:02,933 ERROR ql.Driver
>     (SessionState.java:printError(279)) - FAILED: Unknown exception : null
>     java.lang.NullPointerException
>             at
>     org.apache.hadoop.hive.ql.parse.QBMetaData.getTableForAlias(QBMetaData.java:76)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.getTableColumnDesc(PartitionPruner.java:284)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:217)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
>             at
>     org.apache.hadoop.hive.ql.parse.PartitionPruner.addExpression(PartitionPruner.java:377)
>             at
>     org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPartitionPruners(SemanticAnalyzer.java:608)
>             at
>     org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:3785)
>             at
>     org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
>             at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:177)
>             at org.apache.hadoop.hive.ql.Driver.run(Driver.java:209)