|
|
Arko Provo Mukherjee 2011-10-27, 08:22
Hi,
I have a situation where I have to read a large file into every mapper.
Since its a large HDFS file that is needed to work on each input to the mapper, it is taking a lot of time to read the data into the memory from HDFS.
Thus the system is killing all my Mappers with the following message:
11/10/26 22:54:52 INFO mapred.JobClient: Task Id : attempt_201106271322_12504_m_000000_0, Status : FAILED Task attempt_201106271322_12504_m_000000_0 failed to report status for 601 seconds. Killing!
The cluster is not entirely owned by me and hence I cannot change the * mapred.task.timeout* so as to be able to read the entire file.
Any suggestions?
Also, is there a way such that a Mapper instance reads the file once for all the inputs that it receives. Currently, since the file reading code is in the map method, I guess its reading the entire file for each and every input leading to a lot of overhead.
Please help!
Many thanks in advance!!
Warm regards Arko
-
Re: Mappers getting killed
Lucian Iordache 2011-10-27, 09:31
Hi,
Probably your map method takes too long to process the data. You could add some context.progress() or context.setStatus("status") in your map method from time to time (at least once every 600 seconds, to not get the timeout).
Regards, Lucian
On Thu, Oct 27, 2011 at 11:22 AM, Arko Provo Mukherjee < [EMAIL PROTECTED]> wrote:
> Hi, > > I have a situation where I have to read a large file into every mapper. > > Since its a large HDFS file that is needed to work on each input to the > mapper, it is taking a lot of time to read the data into the memory from > HDFS. > > Thus the system is killing all my Mappers with the following message: > > 11/10/26 22:54:52 INFO mapred.JobClient: Task Id : > attempt_201106271322_12504_m_000000_0, Status : FAILED > Task attempt_201106271322_12504_m_000000_0 failed to report status for 601 > seconds. Killing! > > The cluster is not entirely owned by me and hence I cannot change the * > mapred.task.timeout* so as to be able to read the entire file. > > Any suggestions? > > Also, is there a way such that a Mapper instance reads the file once for > all the inputs that it receives. > Currently, since the file reading code is in the map method, I guess its > reading the entire file for each and every input leading to a lot of > overhead. > > Please help! > > Many thanks in advance!! > > Warm regards > Arko >
-- Numai bine, Lucian
-
Re: Mappers getting killed
Brock Noland 2011-10-27, 13:19
Hi,
On Thu, Oct 27, 2011 at 3:22 AM, Arko Provo Mukherjee <[EMAIL PROTECTED]> wrote: > Hi, > > I have a situation where I have to read a large file into every mapper. > > Since its a large HDFS file that is needed to work on each input to the > mapper, it is taking a lot of time to read the data into the memory from > HDFS. > > Thus the system is killing all my Mappers with the following message: > > 11/10/26 22:54:52 INFO mapred.JobClient: Task Id : > attempt_201106271322_12504_m_000000_0, Status : FAILED > Task attempt_201106271322_12504_m_000000_0 failed to report status for 601 > seconds. Killing! > > The cluster is not entirely owned by me and hence I cannot change > the mapred.task.timeout so as to be able to read the entire file. > Any suggestions? > Also, is there a way such that a Mapper instance reads the file once for all > the inputs that it receives. > Currently, since the file reading code is in the map method, I guess its > reading the entire file for each and every input leading to a lot of > overhead. The file should be read in, in the configure() (old api) or setup() (new api) method.
Brock
-
Re: Mappers getting killed
Arko Provo Mukherjee 2011-10-28, 08:26
Thanks!
I will try and let know.
Warm regards Arko
On Oct 27, 2011, at 8:19 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
> Hi, > > On Thu, Oct 27, 2011 at 3:22 AM, Arko Provo Mukherjee > <[EMAIL PROTECTED]> wrote: >> Hi, >> >> I have a situation where I have to read a large file into every mapper. >> >> Since its a large HDFS file that is needed to work on each input to the >> mapper, it is taking a lot of time to read the data into the memory from >> HDFS. >> >> Thus the system is killing all my Mappers with the following message: >> >> 11/10/26 22:54:52 INFO mapred.JobClient: Task Id : >> attempt_201106271322_12504_m_000000_0, Status : FAILED >> Task attempt_201106271322_12504_m_000000_0 failed to report status for 601 >> seconds. Killing! >> >> The cluster is not entirely owned by me and hence I cannot change >> the mapred.task.timeout so as to be able to read the entire file. >> Any suggestions? >> Also, is there a way such that a Mapper instance reads the file once for all >> the inputs that it receives. >> Currently, since the file reading code is in the map method, I guess its >> reading the entire file for each and every input leading to a lot of >> overhead. > > > The file should be read in, in the configure() (old api) or setup() > (new api) method. > > Brock
-
Re: Mappers getting killed
Arko Provo Mukherjee 2011-10-31, 23:41
Hi,
I used the setStatus method and now my mappers are not getting killed anymore.
Thanks a lot!
Warm regards Arko
On Thu, Oct 27, 2011 at 4:31 AM, Lucian Iordache < [EMAIL PROTECTED]> wrote:
> Hi, > > Probably your map method takes too long to process the data. You could add > some context.progress() or context.setStatus("status") in your map method > from time to time (at least once every 600 seconds, to not get the timeout). > > Regards, > Lucian > > > On Thu, Oct 27, 2011 at 11:22 AM, Arko Provo Mukherjee < > [EMAIL PROTECTED]> wrote: > >> Hi, >> >> I have a situation where I have to read a large file into every mapper. >> >> Since its a large HDFS file that is needed to work on each input to the >> mapper, it is taking a lot of time to read the data into the memory from >> HDFS. >> >> Thus the system is killing all my Mappers with the following message: >> >> 11/10/26 22:54:52 INFO mapred.JobClient: Task Id : >> attempt_201106271322_12504_m_000000_0, Status : FAILED >> Task attempt_201106271322_12504_m_000000_0 failed to report status for >> 601 seconds. Killing! >> >> The cluster is not entirely owned by me and hence I cannot change the * >> mapred.task.timeout* so as to be able to read the entire file. >> >> Any suggestions? >> >> Also, is there a way such that a Mapper instance reads the file once for >> all the inputs that it receives. >> Currently, since the file reading code is in the map method, I guess its >> reading the entire file for each and every input leading to a lot of >> overhead. >> >> Please help! >> >> Many thanks in advance!! >> >> Warm regards >> Arko >> > > > > -- > Numai bine, > Lucian >
|
|