Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Hadoop cluster hangs on big hive job


Copy link to this message
-
Re: Hadoop cluster hangs on big hive job
Håvard Wahl Kongsgård 2013-03-08, 16:31
Dude I'am not going to read all you log files,

but try to run this as a normal map reduce job, it could be memory
related, something wrong with some of the zip files, wrong config
etc.....

-Håvard

On Thu, Mar 7, 2013 at 8:53 PM, Daning Wang <[EMAIL PROTECTED]> wrote:
> We have hive query processing zipped csv files. the query was scanning for
> 10 days(partitioned by date). data for each day around 130G. The problem is
> not consistent since if you run it again, it might go through. but the
> problem has never happened on the smaller jobs(like processing only one days
> data).
>
> We don't have space issue.
>
> I have attached log file when problem happening. it is stuck like
> following(just search "19706 of 49964")
>
> 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
>
> Thanks,
>
> Daning
>
>
> On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård
> <[EMAIL PROTECTED]> wrote:
>>
>> hadoop logs?
>>
>> On 6. mars 2013 21:04, "Daning Wang" <[EMAIL PROTECTED]> wrote:
>>>
>>> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
>>> running big jobs. Basically all the nodes are dead, from that trasktracker's
>>> log looks it went into some kinds of loop forever.
>>>
>>> All the log entries like this when problem happened.
>>>
>>> Any idea how to debug the issue?
>>>
>>> Thanks in advance.
>>>
>>>
>>> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000016_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >

Håvard Wahl Kongsgård
Data Scientist
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/