Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - How do I determine the memory footprint of an Avro object?


+
W.P. McNeill 2012-02-06, 23:09
Copy link to this message
-
Re: How do I determine the memory footprint of an Avro object?
Scott Carey 2012-02-07, 02:48


On 2/6/12 3:09 PM, "W.P. McNeill" <[EMAIL PROTECTED]> wrote:

> I'm debugging out of memory errors in my application. I suspect that some of
> my Avro objects are really big. Is there a way to tell how many bytes a given
> Avro object occupies in memory? My current solution is to count the number of
> characters in its stringification, but this is a bit of a hack.

What Avro language implementation?

For Java, there is nothing built-in to do this.  It will differ depending on
the object format in use (Specific or Generic?) and whether the JVM is
running in 32 bit, 64 bit or 64 bit w/ compressed Oops as well.

For quick debugging, a 'jmap ¬≠histo:live'  print-out may help you identify
what is taking up the memory.  To understand the size of a record, here are
a few points:

Intrinsics are boxed for most cases, so the size per
boolean/int/float/long/double is going to be 16 to 24 bytes, depending on
the size and JVM configuration.
Generic Records are Object[] underneath the covers.  This adds 16 + (# of
fields * 4) bytes per field, except for 64 bit non-compressed pointers which
is 8 bytes per field.  SpecificRecords are a little smaller, especially for
primitive fields.

In general, I would expect all language implementations to use RAM for a
record that is somewhere between 2x to 16x serialized binary form.  There
are corner cases that will more closely match (a large byte array) or differ
(maps and arrays of records).