Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> BatchScanner sort question


Copy link to this message
-
Re: BatchScanner sort question
The batch scanner works by getting batches from all tablets in the scan.
This will typically result in getting sequential batches that are in
non-sequential ordering. Because batches are solely based on individual
key-value pairs, it is possible to get a batch that ends mid-row such that
the following key is a completely different key, also possibly mid-row. If
you want to guarantee entire rows, the whole row iterator can be used.

tldr; Option2 is accurate, but you can force Option1 to occur
On Fri, Oct 25, 2013 at 12:59 PM, Peter Rainer <[EMAIL PROTECTED]>wrote:

> Hi,
>
> in the BatchScanner JavaDoc it says "Also only use this *when you do not
> care about the returned data being in sorted order*.* *If you want to
> lookup a few ranges and expect those ranges to contain a lot of data, then
> use the Scanner instead. Also, the Scanner will return data in sorted
> order, this will not."
>
> I'm not a 100% sure how to interpret this, so I was wondering if anyone of
> you could help me clarify that:
>
> *Option 1)*
> Rows are not sorted, but all Key/Value Pairs with the same Row Key are in
> sequence
>
> Example:
> Format: Key:CF:CQ:Value
> A:CF1:CQ1:1
> A:CF2:CQ2:2
> C:CF1:CQ1:1
> B:CF1:CQ1:1
>
> *Option2)*
> Rows are not sorted and not even Key/Value Pairs with the same Row Key are
> in sequence
>
> Example:
> Format: Key:CF:CQ:Value
> A:CF1:CQ1:1
> C:CF1:CQ1:1
> A:CF2:CQ2:2
> B:CF1:CQ1:1
>
>
> Thanks,
> Peter
>
>