|
|
-
Re: Hadoop cluster hangs on big hive jobHåvard Wahl Kongsgård 2013-03-08, 16:31
Dude I'am not going to read all you log files,
but try to run this as a normal map reduce job, it could be memory related, something wrong with some of the zip files, wrong config etc..... -Håvard On Thu, Mar 7, 2013 at 8:53 PM, Daning Wang <[EMAIL PROTECTED]> wrote: > We have hive query processing zipped csv files. the query was scanning for > 10 days(partitioned by date). data for each day around 130G. The problem is > not consistent since if you run it again, it might go through. but the > problem has never happened on the smaller jobs(like processing only one days > data). > > We don't have space issue. > > I have attached log file when problem happening. it is stuck like > following(just search "19706 of 49964") > > 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of 49964 > at 0.00 MB/s) > > 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of 49964 > at 0.00 MB/s) > > 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of 49964 > at 0.00 MB/s) > > 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of 49964 > at 0.00 MB/s) > > 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of 49964 > at 0.00 MB/s) > > 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of 49964 > at 0.00 MB/s) > > > Thanks, > > Daning > > > On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård > <[EMAIL PROTECTED]> wrote: >> >> hadoop logs? >> >> On 6. mars 2013 21:04, "Daning Wang" <[EMAIL PROTECTED]> wrote: >>> >>> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while >>> running big jobs. Basically all the nodes are dead, from that trasktracker's >>> log looks it went into some kinds of loop forever. >>> >>> All the log entries like this when problem happened. >>> >>> Any idea how to debug the issue? >>> >>> Thanks in advance. >>> >>> >>> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000016_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > >>> 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: >>> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of 49964 >>> at 0.00 MB/s) > Håvard Wahl Kongsgård Data Scientist Faculty of Medicine & Department of Mathematical Sciences NTNU http://havard.security-review.net/ |