Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Problems with timeout when a Hadoop job generates a large number of key-value pairs

Copy link to this message
Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs
Good catch on the Configured - In my tests is extends my subclass of
Configured but a I took out any
dependencies on my environment.

Interesting - I strongly suspect a disk IO or network problem since my code
is very simple and very fast.
If you  add lines to  generateSubStrings to limit String length to 100
characters (I think it is always that but this makes sure) then nothing in
my code should be slow - substring generation is very fast at that size.

I also suspect that the problem in disk i/o may occur in a manner dependent
on the size and storage available on the nodes - What I find looking at the
Hadoop job tracker UI is that the mappers complete the first 25% of the
work relatively rapidly and the timeout occurs in my hands in the last half
of the map - we are having issues with a 10 GB file and, if it is not too
much trouble you might try a larger value for  DESIRED_LENGTH   - say 20 GB
Also how big is your cluster and how long do the mappers take?

public static String[] generateSubStrings(String inp, int minLength, int
maxLength) {
           // guarantee no more than 100 characters
            if(inp.length() > 100)
                      inp = inp.substring(0,100);
            List<String> holder = new ArrayList<String>();
            for (int start = 0; start < inp.length() - minLength; start++) {
                for (int end = start + minLength; end <
Math.min(inp.length(), start + maxLength); end++) {
                    try {
                        holder.add(inp.substring(start, end));
                    catch (Exception e) {
                        throw new RuntimeException(e);


On Fri, Jan 20, 2012 at 12:41 PM, Alex Kozlov <[EMAIL PROTECTED]> wrote:

> Hi Steve, I ran your job on our cluster and it does not timeout.  I noticed
> that each mapper runs for a long time: one way to avoid a timeout is to
> update a user counter.  As long as this counter is updated within 10
> minutes, the task should not timeout (as MR knows that something is being
> done).  Normally an output bytes counter would be updated, but if the job
> is stuck somewhere doing something it will timeout.  I agree that there
> might be a disk IO or network problem that causes a long wait, but without
> detailed logs it's hard to tell.
> On the side note the SubstringCount class should extend Configured.
> --
> Alex K
> <http://www.cloudera.com/company/press-center/hadoop-world-nyc/>
> On Fri, Jan 20, 2012 at 12:18 PM, Michel Segel <[EMAIL PROTECTED]
> >wrote:
> > Steve,
> > If you want me to debug your code, I'll be glad to set up a billable
> > contract... ;-)
> >
> > What I am willing to do is to help you to debug your code...
> >
> > Did you time how long it takes in the Mapper.map() method?
> > The reason I asked this is to first confirm that you are failing within a
> > map() method.
> > It could be that you're just not updating your status...
> >
> > You said that you are writing many output records for a single input.
> >
> > So let's take a look at your code.
> > Are all writes of the same length? Meaning that in each iteration of
> > Mapper.map() you will always write. K number of rows?
> >
> > If so, ask yourself why some iterations are taking longer and longer?
> >
> > Note: I'm assuming that the time for each iteration is taking longer than
> > the previous...
> >
> > Or am I missing something?
> >
> > -Mike
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Jan 20, 2012, at 11:16 AM, Steve Lewis <[EMAIL PROTECTED]> wrote:
> >
> > > We have been having problems with mappers timing out after 600 sec when
> > the
> > > mapper writes many more, say thousands of records for every
> > > input record - even when the code in the mapper is small and fast. I
> > > no idea what could cause the system to be so slow and am reluctant to
> > raise
> > > the 600 sec limit without understanding why there should be a timeout

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com