-Re: Task Attempt failed to report status..Killing !!
Praveen Bysani 2013-05-16, 17:41
I tried different things, finally changing the io.sort.mb to a smaller
value helped resolving this issue.
On 15 May 2013 06:29, Cheolsoo Park <[EMAIL PROTECTED]> wrote:
> Sounds like your mappers are overloaded. Can you try the following?
> 1. You can set mapred.max.split.size to a smaller value, so more mappers
> can be launched.
> 2. You can set mapred.task.timeout to a larger value. The default value is
> 600 seconds.
> On Mon, May 13, 2013 at 8:03 PM, Praveen Bysani <[EMAIL PROTECTED]
> > Hi,
> > I have a very weird issue with my PIG script. Following is the content of
> > my script
> > *REGISTER /home/hadoopuser/Workspace/lib/piggybank.jar*
> > *REGISTER /home/hadoopuser/Workspace/lib/datafu.jar;*
> > *REGISTER
> > *
> > *REGISTER
> > *
> > *SET default_parallel 15;*
> > *records = LOAD 'hbase://dm-re' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('v:ctm v:src','-caching
> > 5000 -gt 1366098805& -lt 1366102543&') as
> > (time:chararray,company:chararray);*
> > *records_iso = FOREACH records GENERATE
> > HH:mm:ss Z') as iso_time;*
> > *records_group = GROUP records_iso ALL;*
> > *result = FOREACH records_group GENERATE MAX(records_iso.iso_time) as
> > maxtime;*
> > *DUMP result*
> > When i try to run this script in cluster of 5 nodes with 20 map slots,
> > most of the map tasks fail with the following error after 10 mins of
> > initializing,
> > *Task attempt <id> failed to report status for 600 seconds. Killing!*
> > I tried to decrease the caching size to less than 100 or so, (under the
> > intuition that maybe fetching and processing more cache is taking more
> > time) but still the same issue. However if i manage to load the rows
> > lt and gt) such that number of map tasks are <=2, the job will be
> > successfully finished. When the number of tasks is > 2 , it is always the
> > case that 2-4 tasks are completed and the rest all fail with the above
> > mentioned error. I attach the task tracker log hereby for this attempt. I
> > don't see any error except for some zookeeper connection warnings. I
> > manually checked from that node and doing a 'hbase zkcli' connects
> > any issue. Hence, I assume that zookeeper is configured properly.
> > I don't really understand where to debug this problem. It would be great
> > if someone could provide assistance. Some configurations of the cluster,
> > which i think maybe relevant here,
> > *dfs.block.size = 1 GB
> > io.sort.mb = 1 GB
> > HRegion size = 1 GB
> > *
> > and the size of the hbase table is close to 250 GB. I have observed 100%
> > cpu usage by the mapred user on the node, while the task is under
> > execution. I am not really sure, what to optimize in this case for the
> > to complete. It would be good if someone can throw some light in this
> > direction.
> > PS: All my nodes in the cluster are configured on a EBS backed amazon ec2
> > cluster.
> > --
> > Regards,
> > Praveen Bysani
> > http://www.praveenbysani.com