Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Task Attempt failed to report status..Killing !!


+
Praveen Bysani 2013-05-14, 03:03
+
Cheolsoo Park 2013-05-14, 22:29
Copy link to this message
-
Re: Task Attempt failed to report status..Killing !!
Hi,

I tried different things, finally changing the io.sort.mb to a smaller
value helped resolving this issue.

On 15 May 2013 06:29, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Sounds like your mappers are overloaded. Can you try the following?
>
> 1. You can set mapred.max.split.size to a smaller value, so more mappers
> can be launched.
>
> or
>
> 2. You can set mapred.task.timeout to a larger value. The default value is
> 600 seconds.
>
> Thanks,
> Cheolsoo
>
>
>
> On Mon, May 13, 2013 at 8:03 PM, Praveen Bysani <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I have a very weird issue with my PIG script. Following is the content of
> > my script
> >
> > *REGISTER /home/hadoopuser/Workspace/lib/piggybank.jar*
> > *REGISTER /home/hadoopuser/Workspace/lib/datafu.jar;*
> > *REGISTER
> >
> /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hbase/hbase-0.94.2-cdh4.2.1-security.jar;
> > *
> > *REGISTER
> >
> /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/zookeeper/zookeeper-3.4.5-cdh4.2.1.jar;
> > *
> > *SET default_parallel 15;*
> >
> > *records = LOAD 'hbase://dm-re' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('v:ctm v:src','-caching
> > 5000 -gt 1366098805& -lt 1366102543&') as
> > (time:chararray,company:chararray);*
> >
> > *records_iso = FOREACH records GENERATE
> >
> org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO(time,'yyyy-MM-dd
> > HH:mm:ss Z') as iso_time;*
> > *records_group = GROUP records_iso ALL;*
> > *result = FOREACH records_group GENERATE MAX(records_iso.iso_time) as
> > maxtime;*
> > *DUMP result*
> >
> > When i try to run this script in cluster of 5 nodes with 20 map slots,
> > most of the map tasks fail with the following error after 10 mins of
> > initializing,
> > *Task attempt <id> failed to report status for 600 seconds. Killing!*
> >
> > I tried to decrease the caching size to less than 100 or so, (under the
> > intuition that maybe fetching and processing more cache is taking more
> > time) but still the same issue. However if i manage to load the rows
> (using
> > lt and gt) such that number of map tasks are <=2, the job will be
> > successfully finished. When the number of tasks is > 2 , it is always the
> > case that 2-4 tasks are completed and the rest all fail with the above
> > mentioned error. I attach the task tracker log hereby for this attempt. I
> > don't see any error except for some zookeeper connection warnings. I
> > manually checked from that node and doing a 'hbase zkcli' connects
> without
> > any issue. Hence, I assume that zookeeper is configured properly.
> >
> > I don't really understand where to debug this problem. It would be great
> > if someone could provide assistance. Some configurations of the cluster,
> > which i think maybe relevant here,
> > *dfs.block.size = 1 GB
> > io.sort.mb = 1 GB
> > HRegion size = 1 GB
> >
> > *
> > and the size of the hbase table is close to 250 GB. I have observed 100%
> > cpu usage by the mapred user on the node, while the task is under
> > execution. I am not really sure, what to optimize in this case for the
> job
> > to complete. It would be good if someone can throw some light in this
> > direction.
> >
> > PS: All my nodes in the cluster are configured on a EBS backed amazon ec2
> > cluster.
> >
> >
> > --
> > Regards,
> > Praveen Bysani
> > http://www.praveenbysani.com
> >
>

--
Regards,
Praveen Bysani
http://www.praveenbysani.com