Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - avro object reuse


Copy link to this message
-
RE: avro object reuse
ey-chih chow 2011-06-02, 17:23

We create GenericData.Record a lot in our code via new GenericData.Record(schema).  Will this generates Jackson calls?  Thanks.
Ey-Chih Chow

> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Date: Wed, 1 Jun 2011 18:48:15 -0700
> Subject: Re: avro object reuse
>
> One thing we do right now that might be related is the following:
>
> We keep Avro default Schema values as JsonNode objects. While traversing
> the JSON Avro schema representation using ObjectMapper.readTree() we
> remember JsonNodes that are "default" properties on fields and keep them
> on the Schema object.
> If these keep references to the parent (and the whole JSON tree, or worse,
> the ObjectMapper and input stream) it would be poor use of Jackson by us;
> although we'd need a way to keep a detached JsonNode or equivalent.
>
> However, even if that is the case (which it does not seem to be -- the
> jmap output has no JsonNode instances), it doesn't explain why we would be
> calling ObjectMapper frequently.  We only call
> ObjectMapper.readTree(JsonParser) when creating a Schema from JSON.  We
> call JsonNode methods from extracted fragments for everything else.
>
>
> This brings me to the following suspicion based on the data:
> Somewhere, Schema objects are being created frequently via one of the
> Schema.parse() or Protocol.parse() static methods.
>
> On 6/1/11 5:48 PM, "Tatu Saloranta" <[EMAIL PROTECTED]> wrote:
>
> >On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <[EMAIL PROTECTED]>
> >wrote:
> >> It would be useful to get a 'jmap -histo:live' report as well, which
> >>will
> >> only have items that remain after a full GC.
> >>
> >> However, a high churn of short lived Jackson objects is not expected
> >>here
> >> unless the user is reading Json serialized files and not Avro binary.
> >> Avro Data Files only contain binary encoded Avro content.
> >>
> >> It would be surprising to see many Jackson objects here if reading Avro
> >> Data Files, because we expect to use Jackson to parse an Avro schema
> >>from
> >> json only once or twice per file.  After the schema is parsed, Jackson
> >> shouldn't be used.   A hundred thousand DeserializationConfig instances
> >> means that isn't the case.
> >
> >Right -- it indicates that something (else) is using Jackson; and
> >there will typically be one instance of DeserializationConfig for each
> >data-binding call (ObjectMapper.readValue()), as a read-only copy is
> >made for operation.
> >... or if something is reading schema that many times, that sounds
> >like a problem in itself.
> >
> >-+ Tatu +-
>