It's worse than just merging the tablet sources and iterating to find the
offset... because the underlying sources may contain deleted records, old
versions that are filtered by an iterator, duplicates, and it is further
complicated if you are using combiners in the iterator stack.
Your best bet is probably to perform this sort of indexing within an ingest
framework that understands a bigger picture of how you will use the data
you are ingesting.
Christopher L Tubbs II
On Tue, Dec 4, 2012 at 12:45 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> I was thinking a little more on the subject, and convinced myself that I
> was wrong.
> Since many files on disk correspond to a tablet, the best you can get is
> the index of a key-value pair in a given file for a tablet. To get a sorted
> stream of key-value pairs for this tablet (to compute index offset for a
> key in a tablet), a merged read is performed over all of those files. Local
> key offset for a file is meaningless as it does not imply the correct
> offset for a tablet.
> On 12/3/12 9:30 PM, Josh Elser wrote:
>> Accumulo doesn't expose any internal offsets of Key-Value pairs through
>> the API. While it might be able to extrapolate some of this knowledge from
>> the underlying structure of Accumulo, that isn't the intent of what
>> Accumulo is trying to provide.