Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - help on failed MR jobs (big hive files)


+
Elaine Gan 2012-12-12, 09:43
Copy link to this message
-
Re: help on failed MR jobs (big hive files)
Nitin Pawar 2012-12-13, 07:36
6GB size is nothing. We have done it with few TB of data in hive.
Error you are seeing is on the hadoop side.

You can always optimize your query based on the hadoop compute capacity you
have got and also based on the pattern in the data you will need to design
your schema.

The problem here can be you have got a fucntion to execute in the where
clause. Can you try hard coding them to data range and see if you can get
any improvements.

Alternatively if you can partition your data on date basis, smaller dataset
you will have to read.

If you got good size hadoop cluster then lower the split size and launch
many maps that way it will get executed quickly

by the heapsize increase did you mean increase hive heapsize or hadoop
mapred heapsize ?  You will need to increase the heapsize on mapred by
setting the property
set mapred.job.map.memory.mb=6000;
set mapred.job.reduce.memory.mb=4000;

On Wed, Dec 12, 2012 at 3:13 PM, Elaine Gan <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm trying to run a program on Hadoop.
>
> [Input] tsv file
>
> My program does the following.
> (1) Load tsv into hive
>       load data local inpath 'tsvfile' overwrite into table A partitioned
> by xx
> (2) insert overwrite table B select a, b, c from table A where
> datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))),
> request_date) <= 30
> (3) Running Mahout
>
> In step 2, i am trying to retrieve data from hive for the past month.
> My hadoop work always stopped here.
> When i check through my browser utility it says that
>
> Diagnostic Info:
> # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
> LastFailedTask: task_201211291541_0262_m_001800
>
> Task attempt_201211291541_0262_m_001800_0 failed to report status for 1802
> seconds. Killing!
> Error: Java heap space
> Task attempt_201211291541_0262_m_001800_2 failed to report status for 1800
> seconds. Killing!
> Task attempt_201211291541_0262_m_001800_3 failed to report status for 1801
> seconds. Killing!
>
>
>
> Each hive table is big, around 6 GB.
>
> (1) Is it too big to have around 6GB for each hive table?
> (2) I've increased by HEAPSIZE to 50G,which i think is far more than
> enough. Any else
> where i can do the tuning?
>
>
> Thank you.
>
>
>
> rei
>
>
>
--
Nitin Pawar
+
Mark Grover 2012-12-13, 07:46
+
Elaine Gan 2012-12-14, 06:54