Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Random access in an avro file


+
kulkarni.swarnim@...) 2013-07-01, 15:37
+
Doug Cutting 2013-07-01, 17:05
+
kulkarni.swarnim@...) 2013-07-01, 17:26
+
Doug Cutting 2013-07-01, 17:51
Copy link to this message
-
Re: Random access in an avro file
Thanks again Doug. SortedKeyValueFile looks really promising and seems to
fit our use case well.

One last thing I was concerned about was the performance of maintaining the
sorted order in the file. Especially because in our case the file might get
pretty large(hundred thousands to million). If there is a limit on the file
size to achieve maximum performance, we can possibly think about closing
the file and start writing to another file once we start to hit that limit.
On Mon, Jul 1, 2013 at 12:51 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> On Mon, Jul 1, 2013 at 10:26 AM, [EMAIL PROTECTED]
> <[EMAIL PROTECTED]> wrote:
> > Out of curiosity, is maintaining sync markers while writing the file and
> > then passing these markers to the readers while reading not a good way to
> > achieve random access in avro?
>
> Yes, seeking to the position of a sync marker is possible.  This is
> what SortedKeyValueFile does.  You need to store the list of positions
> of sync markers, and if seek is to a column value rather than a row
> number, then you need to store these values (keys) with the positions.
>  Those are what's in SortedKeyValueFile's "index" file.
>
> Doug
>

--
Swarnim
+
kulkarni.swarnim@...) 2013-07-01, 22:22
+
Doug Cutting 2013-07-01, 22:52
+
Scott Carey 2013-07-02, 18:59