Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Hadoop cluster hangs on big hive job


Copy link to this message
-
Re: Hadoop cluster hangs on big hive job
I have seen one such problem related to big hive jobs that open a lot of
files. See HDFS-4496 for more details. Snippet from the description:
The following issue was observed in a cluster that was running a Hive job
and was writing to 100,000 temporary files (each task is writing to 1000s
of files). When this job is killed, a large number of files are left open
for write. Eventually when the lease for open files expires, lease recovery
is started for all these files in a very short duration of time. This
causes a large number of commitBlockSynchronization where logSync is
performed with the FSNamesystem lock held. This overloads the namenode
resulting in slowdown.

Could this be the cause? Can you see namenode log to see if you have lease
recovery activity? If not, can you send some information about what is
happening in the namenode logs at the time of this slowdown?

On Mon, Mar 11, 2013 at 1:32 PM, Daning Wang <[EMAIL PROTECTED]> wrote:

> [hive@mr3-033 ~]$ hadoop version
> Hadoop 1.0.4
> Subversion
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1393290
> Compiled by hortonfo on Wed Oct  3 05:13:58 UTC 2012
>
>
> On Sun, Mar 10, 2013 at 8:16 AM, Suresh Srinivas <[EMAIL PROTECTED]>wrote:
>
>> What is the version of hadoop?
>>
>> Sent from phone
>>
>> On Mar 7, 2013, at 11:53 AM, Daning Wang <[EMAIL PROTECTED]> wrote:
>>
>> We have hive query processing zipped csv files. the query was scanning
>> for 10 days(partitioned by date). data for each day around 130G. The
>> problem is not consistent since if you run it again, it might go through.
>> but the problem has never happened on the smaller jobs(like processing only
>> one days data).
>>
>> We don't have space issue.
>>
>> I have attached log file when problem happening. it is stuck like
>> following(just search "19706 of 49964")
>>
>> 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of
>> 49964 at 0.00 MB/s) >
>> 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of
>> 49964 at 0.00 MB/s) >
>> 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of
>> 49964 at 0.00 MB/s) >
>> 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of
>> 49964 at 0.00 MB/s) >
>> 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of
>> 49964 at 0.00 MB/s) >
>> 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of
>> 49964 at 0.00 MB/s) >
>>
>> Thanks,
>>
>> Daning
>>
>>
>> On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård <
>> [EMAIL PROTECTED]> wrote:
>>
>>> hadoop logs?
>>> On 6. mars 2013 21:04, "Daning Wang" <[EMAIL PROTECTED]> wrote:
>>>
>>>> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
>>>> running big jobs. Basically all the nodes are dead, from that
>>>> trasktracker's log looks it went into some kinds of loop forever.
>>>>
>>>> All the log entries like this when problem happened.
>>>>
>>>> Any idea how to debug the issue?
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of
>>>> 49964 at 0.00 MB/s) >
>>>> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of
>>>> 49964 at 0.00 MB/s) >
>>>> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of
>>>> 49964 at 0.00 MB/s) >
>>>> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker:
http://hortonworks.com/download/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB