Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> benefits to store the data as key/value in avro

Copy link to this message
benefits to store the data as key/value in avro
I have a question related to how to store my data in AVRO. Right now, I have 2 options, first one is serialize the whole object as one Avro object, like following:
foo {id1 long,id2 long,id3 long,data record}
The question is that I know most of my data will be query by either id1, id2 or id3, in MR job, or hive or pig.
So I am thinking that I maybe can store my data as key/value in avro
composite_key {id1 long,id2 long,id3 long}
value {data record}
My question is that what benefit 2nd format can bring? If the data is stored as Pair(composit_key, value) in the Avro in HDFS, when querying time, assume most of the query will on id1 to id3, Will I save the IO during the scanning? I mean will Avro only deserialize the ID fields for most of the part in MR job?
If I don't get above benefit, then I didn't see any reason to store as key/value format, since the first format will be good enough for most cases, right?