Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - HBase Developer's Pow-wow.


Copy link to this message
-
Re: HBase Developer's Pow-wow.
Andrew Purtell 2012-09-10, 17:58
Hi Jaques,

> Does family level indexing make sense or is the real need for qualifier
> level indexing?

The use cases considered, at least over here at TM, all come down to
range scanning over values (e.g. WHERE INTEGER($value) < 50). So we
need a mapping such that a scan over the index returns either lists of
pointers to row:family:qualifier, or the value itself embedded in the
index, following the natural order of values in the primary table as
given by a comparator. And a number of projections like this. A set of
default comparators for interpreting values as integers, longs,
floating point, and complex JSON or AVRO records, would be useful.

> What are ideas for a client interface and how transparent is index usage?
>  (E.g. if you set a filter on a qualifier... )

It would be nice if the existing client API can handle it somehow.
Get, Put, Increment, Scan, all of these API objects can transmit
arbitrary attributes from the client to the server. It would be low
friction for a user to modify their use of these existing API objects,
rather than using a completely different interface like coprocessor
Endpoint invocations. (Or, at least a client library should hide that,
in that case.)

> What were the challenges and issues with the proof of concept TrendMicro
> approach that ultimately made it untenable? (was an eventually consistent
> approach)

This was simply a prototype implementation quality issue, nothing
wrong about an eventually consistent approach per se.

> Is it important to colocate/duplicate indexed values and/or additional
> portions of data in secondary indices to minimize disk seeks (almost making
> HBase optionally more columnar in nature)?

I do think we want to offer the Megastore-like option for storing
value data into indexes, and also not. Then we can manage this
tradeoff of minimizing seeks and round trips versus increased storage
utilization on a per-index basis according to the needs of the use
case.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)