Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> join operation fails on big data set


Copy link to this message
-
Re: join operation fails on big data set
Hi, Mua:
Your log has
2013-04-12 14:00:00,777 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-199210310173610155_28360 *bad datanode[0]**
10.6.25.33:49197*
2013-04-12 14:00:00,866 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-199210310173610155_28360 in pipeline
10.6.25.33:49197, 10.6.25.141:39369, 10.6.25.31:54563: *bad datanode**
10.6.25.33:49197*

can you check your datanode 10.6.25.33:49197 ? You can either
log in to that node to check if the datanode daemon is on
or you can go to your namenode URL <namenode_IP>:50070/dfshealth.jsp   (it
shows how many live DN, how many dead DN)
or you can <namenode_IP>:50070/dfsclusterhealth.jsp  (it shows how many
live DN, how many dead DN)
You can bump your log level by open /etc/pig/conf/pig.properties and
change debug=DEBUG (sorry for confusing, I didn't want to mean log4j debug
level here....)
this way, you will have something like 2013-04-15 10:46:04,069 [main] DEBUG
xxxxxxxxx in your console output...

Johnny
On Sun, Apr 14, 2013 at 6:13 AM, Mua Ban <[EMAIL PROTECTED]> wrote:

> Hi Johnny,
>
> Thank you very much for your email. I am very new here. Please tell me
> where to check the health of the data node (which log file should I look
> at?), and how to set the logging level of log4j to DEBUG.
>
> Thanks,
> -Mua
>
>
> On Fri, Apr 12, 2013 at 5:01 PM, Johnny Zhang <[EMAIL PROTECTED]>
> wrote:
>
> > seems a HDFS issue, as you said, cannot retrieval certain block from
> > certain DN. Can you check the health of all DN? And properly also bump
> the
> > log4j level to DEBUG.
> >
> > Johnny
> >
> >
> > On Fri, Apr 12, 2013 at 12:06 PM, Mua Ban <[EMAIL PROTECTED]> wrote:
> >
> > > Thank you very much Cheolsoo,
> > >
> > > I am running the script once more right now and I see 7 failed reducers
> > at
> > > the moment on the job tracker GUI. I browse these failed reducers and I
> > > found the task logs. From these 7 failed reducers, some have type 1
> task
> > > log, the rest have type 2 task log as I show below.
> > >
> > > They seem related to some connection issue among nodes in the cluster.
> Do
> > > you know any parameters I should configure to figure out the actual
> > > problem?
> > >
> > > Thank you,
> > > -Mua
> > >
> > > ---------------------------------------
> > > *Type 1 task log*
> > >
> > > 3-04-12 13:42:24,960 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 5 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:25,259 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:25,271 INFO org.apache.hadoop.mapred.ReduceTask:
> > > Initiating in-memory merge with 610 segments...
> > > 2013-04-12 13:42:25,273 INFO org.apache.hadoop.mapred.Merger: Merging
> 610
> > > sorted segments
> > > 2013-04-12 13:42:25,275 INFO org.apache.hadoop.mapred.Merger: Down to
> the
> > > last merge-pass, with 610 segments left of total size: 96922927 bytes
> > > 2013-04-12 13:42:27,348 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Merge of the 610 files in-memory
> > > complete. Local file is
> > >
> > >
> >
> /hdfs/sp/filesystem/mapred/local/taskTracker/vul/jobcache/job_201304081613_0049/attempt_201304081613_0049_r_000009_0/output/map_6.out
> > > of size 96921713
> > > 2013-04-12 13:42:27,349 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Thread waiting: Thread for merging
> > > on-disk files
> > > 2013-04-12 13:42:30,263 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:35,267 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 2 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:38,145 INFO org.apache.hadoop.mapred.ReduceTask: