Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> join operation fails on big data set


Copy link to this message
-
Re: join operation fails on big data set
Hi, Mua:
Your log has
2013-04-12 14:00:00,777 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-199210310173610155_28360 *bad datanode[0]**
10.6.25.33:49197*
2013-04-12 14:00:00,866 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-199210310173610155_28360 in pipeline
10.6.25.33:49197, 10.6.25.141:39369, 10.6.25.31:54563: *bad datanode**
10.6.25.33:49197*

can you check your datanode 10.6.25.33:49197 ? You can either
log in to that node to check if the datanode daemon is on
or you can go to your namenode URL <namenode_IP>:50070/dfshealth.jsp   (it
shows how many live DN, how many dead DN)
or you can <namenode_IP>:50070/dfsclusterhealth.jsp  (it shows how many
live DN, how many dead DN)
You can bump your log level by open /etc/pig/conf/pig.properties and
change debug=DEBUG (sorry for confusing, I didn't want to mean log4j debug
level here....)
this way, you will have something like 2013-04-15 10:46:04,069 [main] DEBUG
xxxxxxxxx in your console output...

Johnny
On Sun, Apr 14, 2013 at 6:13 AM, Mua Ban <[EMAIL PROTECTED]> wrote:

> Hi Johnny,
>
> Thank you very much for your email. I am very new here. Please tell me
> where to check the health of the data node (which log file should I look
> at?), and how to set the logging level of log4j to DEBUG.
>
> Thanks,
> -Mua
>
>
> On Fri, Apr 12, 2013 at 5:01 PM, Johnny Zhang <[EMAIL PROTECTED]>
> wrote:
>
> > seems a HDFS issue, as you said, cannot retrieval certain block from
> > certain DN. Can you check the health of all DN? And properly also bump
> the
> > log4j level to DEBUG.
> >
> > Johnny
> >
> >
> > On Fri, Apr 12, 2013 at 12:06 PM, Mua Ban <[EMAIL PROTECTED]> wrote:
> >
> > > Thank you very much Cheolsoo,
> > >
> > > I am running the script once more right now and I see 7 failed reducers
> > at
> > > the moment on the job tracker GUI. I browse these failed reducers and I
> > > found the task logs. From these 7 failed reducers, some have type 1
> task
> > > log, the rest have type 2 task log as I show below.
> > >
> > > They seem related to some connection issue among nodes in the cluster.
> Do
> > > you know any parameters I should configure to figure out the actual
> > > problem?
> > >
> > > Thank you,
> > > -Mua
> > >
> > > ---------------------------------------
> > > *Type 1 task log*
> > >
> > > 3-04-12 13:42:24,960 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 5 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:25,259 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:25,271 INFO org.apache.hadoop.mapred.ReduceTask:
> > > Initiating in-memory merge with 610 segments...
> > > 2013-04-12 13:42:25,273 INFO org.apache.hadoop.mapred.Merger: Merging
> 610
> > > sorted segments
> > > 2013-04-12 13:42:25,275 INFO org.apache.hadoop.mapred.Merger: Down to
> the
> > > last merge-pass, with 610 segments left of total size: 96922927 bytes
> > > 2013-04-12 13:42:27,348 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Merge of the 610 files in-memory
> > > complete. Local file is
> > >
> > >
> >
> /hdfs/sp/filesystem/mapred/local/taskTracker/vul/jobcache/job_201304081613_0049/attempt_201304081613_0049_r_000009_0/output/map_6.out
> > > of size 96921713
> > > 2013-04-12 13:42:27,349 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Thread waiting: Thread for merging
> > > on-disk files
> > > 2013-04-12 13:42:30,263 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:35,267 INFO org.apache.hadoop.mapred.ReduceTask:
> > > attempt_201304081613_0049_r_000009_0 Scheduled 2 outputs (0 slow hosts
> > and0
> > > dup hosts)
> > > 2013-04-12 13:42:38,145 INFO org.apache.hadoop.mapred.ReduceTask:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB