Search Hadoop and all its sub project:

Switch to Threaded View
Subject: Re: Catch 22 when obtaining Fields and Objects
Hi Lewis,

Are you trying to avoid transferring unnecessary fields over the network? In that case you'd have to break the schema up into its individual fields, and serialize each individually. However, it's not clear to me whether this would be much of a performance advantage (it would probably depend on the data store's API).

Or are you ok with transferring the entire record over the network, but just want to avoid parsing fields that you don't need? In that case you can use a reader's schema that includes only the fields that you need, and the Avro parser will skip over all fields that are not mentioned in the reader's schema.


On 30 Mar 2014, at 19:57, Lewis John Mcgibbney <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Folks,
Right now over in Gora [0] we write data down into Byte[] before persisting an object into a back end datastore.
We use Avro for our serialization.
The question I would like to pose is as follows

In Gora we can do a get on objects as follows

public T get(K key, String[] fields)

If no field arguments are provided then we query ALL fields.

If however we query for say two string fields "name" and "age" we still need to obtain Field's for the entire object (as they are stored as Byte[]) then sort things out on our end.

Is there a better way we could/should be doing this?

For example, in our gora-dyhamodb store, we simply put objects in their native types and we let DynamoDB deal with the best way to serde the data. We are looking to simulate this across all supported data stores therefore some discussion from this list would be excellent in enabling us to make a more informed decision.
Thanks in advance.

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB