Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Random access in an avro file

Copy link to this message
Re: Random access in an avro file
kulkarni.swarnim@...) 2013-07-01, 17:26
Thanks for the reply Doug.

Out of curiosity, is maintaining sync markers while writing the file and
then passing these markers to the readers while reading not a good way to
achieve random access in avro? Atleast that's what my understanding from
reading the javadoc[1] was, which could be flawed.

On Mon, Jul 1, 2013 at 12:05 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> Avro data files do not generally support random access.
> SortedKeyValueFile supports random access by key.
> http://avro.apache.org/docs/current/api/java/org/apache/avro/hadoop/file/SortedKeyValueFile.Reader.html
> From the documentation:
> "The SortedKeyValueFile is a directory with two files, named 'data'
> and 'index'. The 'data' file is an ordinary Avro container file with
> records. Each record has exactly two fields, 'key' and 'value'. The
> keys are sorted lexicographically. The 'index' file is a small Avro
> container file mapping keys in the 'data' file to their byte
> positions. The index file is intended to fit in memory, so it should
> remain small. There is one entry in the index file for each data block
> in the Avro container file."
> Doug
> On Mon, Jul 1, 2013 at 8:37 AM, [EMAIL PROTECTED]
> <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > Is it possible to have random access to a record in an avro file? For
> > instance, if I have an avro file with a schema containing four records:
> > employee id, name, address and phone. While reading the file, is there
> any
> > way at all to directly jump to a record with employee id 100 instead of
> > having to scan the whole file every single time and filtering out
> records?
> >
> > Thanks for the help.
> >
> > --
> > Swarnim