Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Hadoop cluster hangs on big hive job


Copy link to this message
-
Re: Hadoop cluster hangs on big hive job
Dude I'am not going to read all you log files,

but try to run this as a normal map reduce job, it could be memory
related, something wrong with some of the zip files, wrong config
etc.....

-Håvard

On Thu, Mar 7, 2013 at 8:53 PM, Daning Wang <[EMAIL PROTECTED]> wrote:
> We have hive query processing zipped csv files. the query was scanning for
> 10 days(partitioned by date). data for each day around 130G. The problem is
> not consistent since if you run it again, it might go through. but the
> problem has never happened on the smaller jobs(like processing only one days
> data).
>
> We don't have space issue.
>
> I have attached log file when problem happening. it is stuck like
> following(just search "19706 of 49964")
>
> 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
>
> Thanks,
>
> Daning
>
>
> On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård
> <[EMAIL PROTECTED]> wrote:
>>
>> hadoop logs?
>>
>> On 6. mars 2013 21:04, "Daning Wang" <[EMAIL PROTECTED]> wrote:
>>>
>>> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
>>> running big jobs. Basically all the nodes are dead, from that trasktracker's
>>> log looks it went into some kinds of loop forever.
>>>
>>> All the log entries like this when problem happened.
>>>
>>> Any idea how to debug the issue?
>>>
>>> Thanks in advance.
>>>
>>>
>>> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000016_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >

Håvard Wahl Kongsgård
Data Scientist
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB