Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # dev >> Seeks with DataFileReader in C++

Copy link to this message
Re: Seeks with DataFileReader in C++

I think it is a good use case. One way to achieve what you want is to:

1. Expose the existing members objectCount_ and byteCount_ of DataFileReaderBase as size_t objectsRemainingInBlock() and size_t bytesRemainingInBlock() in DataFileReader class.
2. Add a new method in DataFileReader class void skip(size_t n), which skips n objects.
3. If you prefer you can add skipBlock() which is a shorthand for skip(objectsRemainingInBlock()).

Does it work for you?


 From: Daniel Russel <[EMAIL PROTECTED]>
Sent: Wednesday, 23 January 2013 10:33 PM
Subject: Re: Seeks with DataFileReader in C++
In our case, we have files created from large numbers of frames stored sequentially as records in a data file. Currently, finding the i-th frame requires going to the beginning and reading all records until the appropriate one is found. Doing binary search or some sort of index based search would decrease load times for many operations significantly. It would also make implementing map-reduce sorts of operations on the data files easier since currently there is no reliably way to shard the files.

I'll work on the patch, nothing written yet :-)

On Jan 23, 2013, at 4:56 AM, Thiruvalluvan MG <[EMAIL PROTECTED]> wrote:

> Hi Daniel,
> I think it will be nice if you can describe your use case. Yes, we'll be interested in seeing your implementation. Since this will be an added feature, it harms none unless they use this feature. Please go ahead and create a ticket and submit a patch.
> Thanks
> Thiru
> ________________________________
> From: Daniel Russel <[EMAIL PROTECTED]>
> Sent: Wednesday, 23 January 2013 11:20 AM
> Subject: Seeks with DataFileReader in C++
> From what I can tell, there is no way to do any sort of random access with the C++ DataFileReader API. Is this correct? Is someone working on that? If not, and people think this would be a generally interesting capability, I'd consider implementing it as I'd kind of like to have it. Thanks.
>              --Daniel