Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> avro object reuse


Copy link to this message
-
RE: avro object reuse

We create GenericData.Record a lot in our code via new GenericData.Record(schema).  Will this generates Jackson calls?  Thanks.
Ey-Chih Chow

> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Date: Wed, 1 Jun 2011 18:48:15 -0700
> Subject: Re: avro object reuse
>
> One thing we do right now that might be related is the following:
>
> We keep Avro default Schema values as JsonNode objects. While traversing
> the JSON Avro schema representation using ObjectMapper.readTree() we
> remember JsonNodes that are "default" properties on fields and keep them
> on the Schema object.
> If these keep references to the parent (and the whole JSON tree, or worse,
> the ObjectMapper and input stream) it would be poor use of Jackson by us;
> although we'd need a way to keep a detached JsonNode or equivalent.
>
> However, even if that is the case (which it does not seem to be -- the
> jmap output has no JsonNode instances), it doesn't explain why we would be
> calling ObjectMapper frequently.  We only call
> ObjectMapper.readTree(JsonParser) when creating a Schema from JSON.  We
> call JsonNode methods from extracted fragments for everything else.
>
>
> This brings me to the following suspicion based on the data:
> Somewhere, Schema objects are being created frequently via one of the
> Schema.parse() or Protocol.parse() static methods.
>
> On 6/1/11 5:48 PM, "Tatu Saloranta" <[EMAIL PROTECTED]> wrote:
>
> >On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <[EMAIL PROTECTED]>
> >wrote:
> >> It would be useful to get a 'jmap -histo:live' report as well, which
> >>will
> >> only have items that remain after a full GC.
> >>
> >> However, a high churn of short lived Jackson objects is not expected
> >>here
> >> unless the user is reading Json serialized files and not Avro binary.
> >> Avro Data Files only contain binary encoded Avro content.
> >>
> >> It would be surprising to see many Jackson objects here if reading Avro
> >> Data Files, because we expect to use Jackson to parse an Avro schema
> >>from
> >> json only once or twice per file.  After the schema is parsed, Jackson
> >> shouldn't be used.   A hundred thousand DeserializationConfig instances
> >> means that isn't the case.
> >
> >Right -- it indicates that something (else) is using Jackson; and
> >there will typically be one instance of DeserializationConfig for each
> >data-binding call (ObjectMapper.readValue()), as a read-only copy is
> >made for operation.
> >... or if something is reading schema that many times, that sounds
> >like a problem in itself.
> >
> >-+ Tatu +-
>
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB