Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> benefits to store the data as key/value in avro


Copy link to this message
-
benefits to store the data as key/value in avro
Hi,
I have a question related to how to store my data in AVRO. Right now, I have 2 options, first one is serialize the whole object as one Avro object, like following:
foo {id1 long,id2 long,id3 long,data record}
The question is that I know most of my data will be query by either id1, id2 or id3, in MR job, or hive or pig.
So I am thinking that I maybe can store my data as key/value in avro
composite_key {id1 long,id2 long,id3 long}
value {data record}
My question is that what benefit 2nd format can bring? If the data is stored as Pair(composit_key, value) in the Avro in HDFS, when querying time, assume most of the query will on id1 to id3, Will I save the IO during the scanning? I mean will Avro only deserialize the ID fields for most of the part in MR job?
If I don't get above benefit, then I didn't see any reason to store as key/value format, since the first format will be good enough for most cases, right?
Thanks
Yong    
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB