Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Task Attempt failed to report status..Killing !!

Copy link to this message
Task Attempt failed to report status..Killing !!

I have a very weird issue with my PIG script. Following is the content of
my script

*REGISTER /home/hadoopuser/Workspace/lib/piggybank.jar*
*REGISTER /home/hadoopuser/Workspace/lib/datafu.jar;*
*SET default_parallel 15;*

*records = LOAD 'hbase://dm-re' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('v:ctm v:src','-caching
5000 -gt 1366098805& -lt 1366102543&') as

*records_iso = FOREACH records GENERATE
HH:mm:ss Z') as iso_time;*
*records_group = GROUP records_iso ALL;*
*result = FOREACH records_group GENERATE MAX(records_iso.iso_time) as
*DUMP result*

When i try to run this script in cluster of 5 nodes with 20 map slots, most
of the map tasks fail with the following error after 10 mins of
*Task attempt <id> failed to report status for 600 seconds. Killing!*

I tried to decrease the caching size to less than 100 or so, (under the
intuition that maybe fetching and processing more cache is taking more
time) but still the same issue. However if i manage to load the rows (using
lt and gt) such that number of map tasks are <=2, the job will be
successfully finished. When the number of tasks is > 2 , it is always the
case that 2-4 tasks are completed and the rest all fail with the above
mentioned error. I attach the task tracker log hereby for this attempt. I
don't see any error except for some zookeeper connection warnings. I
manually checked from that node and doing a 'hbase zkcli' connects without
any issue. Hence, I assume that zookeeper is configured properly.

I don't really understand where to debug this problem. It would be great if
someone could provide assistance. Some configurations of the cluster, which
i think maybe relevant here,
*dfs.block.size = 1 GB
io.sort.mb = 1 GB
HRegion size = 1 GB

and the size of the hbase table is close to 250 GB. I have observed 100%
cpu usage by the mapred user on the node, while the task is under
execution. I am not really sure, what to optimize in this case for the job
to complete. It would be good if someone can throw some light in this

PS: All my nodes in the cluster are configured on a EBS backed amazon ec2
Praveen Bysani