Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Random access in an avro file


+
kulkarni.swarnim@...) 2013-07-01, 15:37
Copy link to this message
-
Re: Random access in an avro file
Avro data files do not generally support random access.

SortedKeyValueFile supports random access by key.

http://avro.apache.org/docs/current/api/java/org/apache/avro/hadoop/file/SortedKeyValueFile.Reader.html

>From the documentation:

"The SortedKeyValueFile is a directory with two files, named 'data'
and 'index'. The 'data' file is an ordinary Avro container file with
records. Each record has exactly two fields, 'key' and 'value'. The
keys are sorted lexicographically. The 'index' file is a small Avro
container file mapping keys in the 'data' file to their byte
positions. The index file is intended to fit in memory, so it should
remain small. There is one entry in the index file for each data block
in the Avro container file."

Doug

On Mon, Jul 1, 2013 at 8:37 AM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> Hello,
>
> Is it possible to have random access to a record in an avro file? For
> instance, if I have an avro file with a schema containing four records:
> employee id, name, address and phone. While reading the file, is there any
> way at all to directly jump to a record with employee id 100 instead of
> having to scan the whole file every single time and filtering out records?
>
> Thanks for the help.
>
> --
> Swarnim
+
kulkarni.swarnim@...) 2013-07-01, 17:26
+
Doug Cutting 2013-07-01, 17:51
+
kulkarni.swarnim@...) 2013-07-01, 18:50
+
kulkarni.swarnim@...) 2013-07-01, 22:22
+
Doug Cutting 2013-07-01, 22:52
+
Scott Carey 2013-07-02, 18:59