|
|
-
Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.yaotian 2013-01-15, 08:34
I set mapred.reduce.tasks from -1 to "AutoReduce"
And the hadoop created 450 tasks for Map. But 1 task for Reduce. It seems that this reduce only run on 1 slave (I have two slaves). But when it was running on 66%, the error report again "Task attempt_201301150318_0001_r_000000_0 failed to report status for 601 seconds. Killing!" 2013/1/14 yaotian <[EMAIL PROTECTED]> > How to judge which counter would work? > > > 2013/1/11 <[EMAIL PROTECTED]> > > ** >> Hi >> >> To add on to Harsh's comments. >> >> You need not have to change the task time out. >> >> In your map/reduce code, you can increment a counter or report status >> intermediate on intervals so that there is communication from the task and >> hence won't have a task time out. >> >> Every map and reduce task run on its own jvm limited by a jvm size. If >> you try to holds too much data in memory then it can go beyond the jvm size >> and cause OOM errors. >> >> Regards >> Bejoy KS >> >> Sent from remote device, Please excuse typos >> ------------------------------ >> *From: * yaotian <[EMAIL PROTECTED]> >> *Date: *Fri, 11 Jan 2013 14:35:07 +0800 >> *To: *<[EMAIL PROTECTED]> >> *ReplyTo: * [EMAIL PROTECTED] >> *Subject: *Re: I am running MapReduce on a 30G data on 1master/2 slave, >> but failed. >> >> See inline. >> >> >> 2013/1/11 Harsh J <[EMAIL PROTECTED]> >> >>> If the per-record processing time is very high, you will need to >>> periodically report a status. Without a status change report from the task >>> to the tracker, it will be killed away as a dead task after a default >>> timeout of 10 minutes (600s). >>> >> =====================> Do you mean to increase the report time: "* >> mapred.task.timeout"*? >> >> >>> Also, beware of holding too much memory in a reduce JVM - you're still >>> limited there. Best to let the framework do the sort or secondary sort. >>> >> =======================> You mean use the default value ? This is my >> value. >> *mapred.job.reduce.memory.mb*-1 >> >>> >>> >>> On Fri, Jan 11, 2013 at 10:58 AM, yaotian <[EMAIL PROTECTED]> wrote: >>> >>>> Yes, you are right. The data is GPS trace related to corresponding uid. >>>> The reduce is doing this: Sort user to get this kind of result: uid, gps1, >>>> gps2, gps3........ >>>> Yes, the gps data is big because this is 30G data. >>>> >>>> How to solve this? >>>> >>>> >>>> >>>> 2013/1/11 Mahesh Balija <[EMAIL PROTECTED]> >>>> >>>>> Hi, >>>>> >>>>> 2 reducers are successfully completed and 1498 have been >>>>> killed. I assume that you have the data issues. (Either the data is huge or >>>>> some issues with the data you are trying to process) >>>>> One possibility could be you have many values associated to >>>>> a single key, which can cause these kind of issues based on the operation >>>>> you do in your reducer. >>>>> Can you put some logs in your reducer and try to trace out >>>>> what is happening. >>>>> >>>>> Best, >>>>> Mahesh Balija, >>>>> Calsoft Labs. >>>>> >>>>> >>>>> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> I have 1 hadoop master which name node locates and 2 slave which >>>>>> datanode locate. >>>>>> >>>>>> If i choose a small data like 200M, it can be done. >>>>>> >>>>>> But if i run 30G data, Map is done. But the reduce report error. Any >>>>>> sugggestion? >>>>>> >>>>>> >>>>>> This is the information. >>>>>> >>>>>> *Black-listed TaskTrackers:* 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041> >>>>>> ------------------------------ >>>>>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed >>>>>> Task Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041> >>>>>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1> >>>>>> 100.00%4500 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed> >>>>>> 00 / 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed> +
Charlie A. 2013-01-15, 10:09
+
yaotian 2013-01-16, 03:23
|