Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: [Cosmos-dev] Out of memory in identity mapper?


+
SEBASTIAN ORTEGA TORRES 2012-09-06, 15:31
+
Harsh J 2012-09-06, 16:12
Copy link to this message
-
Re: [Cosmos-dev] Out of memory in identity mapper?
Harsh,

Could IsolationRunner be used here. I'd put up a patch for HADOOP-8765,
after applying which IsolationRunner works for me. Maybe we could use it to
re-run the map task that's failing and debug.

Thanks
hemanth

On Thu, Sep 6, 2012 at 9:42 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Protobuf involvement makes me more suspicious that this is possibly a
> corruption or an issue with serialization as well. Perhaps if you can
> share some stack traces, people can help better. If it is reliably
> reproducible, then I'd also check for the count of records until after
> this occurs, and see if the stacktraces are always same.
>
> Serialization formats such as protobufs allocate objects based on read
> sizes (like for example, a string size may be read first before the
> string's bytes are read, and upon size read, such a length is
> pre-allocated for the bytes to be read into), and in cases of corrupt
> data or bugs in the deserialization code, it is quite easy for it to
> make a large alloc request due to a badly read value. Its one
> possibility.
>
> Is the input compressed too, btw? Can you seek out the input file the
> specific map fails on, and try to read it in an isolated manner to
> validate it? Or do all maps seem to fail?
>
> On Thu, Sep 6, 2012 at 9:01 PM, SEBASTIAN ORTEGA TORRES <[EMAIL PROTECTED]>
> wrote:
> > Input files are small fixed-size protobuf records and yes, it is
> > reproducible (but it takes some time).
> > In this case I cannot use combiners since I need to process all the
> elements
> > with the same key altogether.
> >
> > Thanks for the prompt response
> >
> > --
> > Sebastián Ortega Torres
> > Product Development & Innovation / Telefónica Digital
> > C/ Don Ramón de la Cruz 82-84
> > Madrid 28006
> >
> >
> >
> >
> >
> >
> > El 06/09/2012, a las 17:13, Harsh J escribió:
> >
> > I can imagine a huge record size possibly causing this. Is this
> > reliably reproducible? Do you also have combiners enabled, which may
> > run the reducer-logic on the map-side itself?
> >
> > On Thu, Sep 6, 2012 at 8:20 PM, JOAQUIN GUANTER GONZALBEZ <[EMAIL PROTECTED]>
> > wrote:
> >
> > Hello hadoopers!
> >
> >
> >
> >
> > In a reduce-only Hadoop job input files are handled by the identity
> mapper
> >
> > and sent to the reducers without modification. In one of my job I was
> >
> > surprised to see the job failing in the map phase with "Out of memory
> error"
> >
> > and "GC overhead limit exceeded".
> >
> >
> >
> >
> > In my understanding, a memory leak on the identity mapper is out of the
> >
> > question.
> >
> >
> > What can be the cause of such error?
> >
> >
> >
> >
> > Thanks,
> >
> >
> > Ximo.
> >
> >
> >
> >
> > P.S. The logs show no stack trace other than the messages I mentioned
> >
> > before.
> >
> >
> >
> > ________________________________
> >
> > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> >
> > nuestra política de envío y recepción de correo electrónico en el enlace
> >
> > situado más abajo.
> >
> > This message is intended exclusively for its addressee. We only send and
> >
> > receive email on the basis of the terms set out at:
> >
> > http://www.tid.es/ES/PAGINAS/disclaimer.aspx
> >
> >
> >
> >
> > --
> > Harsh J
> >
> > _______________________________________________
> > Cosmos-dev mailing list
> > [EMAIL PROTECTED]
> > https://listas.tid.es/mailman/listinfo/cosmos-dev
> >
> >
> >
> > ________________________________
> >
> > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> > nuestra política de envío y recepción de correo electrónico en el enlace
> > situado más abajo.
> > This message is intended exclusively for its addressee. We only send and
> > receive email on the basis of the terms set out at:
> > http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>
>
>
> --
> Harsh J
>
+
SEBASTIAN ORTEGA TORRES 2012-09-06, 16:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB