Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Get list of unique values for given CF and CQ

Copy link to this message
Re: Get list of unique values for given CF and CQ
There's no built-in iterator to do this. It's difficult to reliably
aggregate/combine data across rows. This might be better suited as a
MapReduce job rather than an iterator. Even if you do something clever
to aggregate within a tablet (like transform all matching keys to a
fixed R/CF/CQ and then using a combiner to group them within that
virtual row) and deal with the potential problems with that (reliably
transforming rowIds is especially tricky), you're still going to need
to aggregate across tablets  with some sort of client code... either
with MapReduce or a single-node client.

Christopher L Tubbs II
On Mon, Oct 14, 2013 at 12:06 PM, Korb, Michael [USA]
> Given a specific CF and CQ, is there an iterator I can use to get all unique
> values across all rows?
> Example:
> row0 myCF:myCQ a
> row1 myCF:myCQ a
> row2 myCF:myCQ a
> row3 myCF:myCQ b
> row4 myCF:myCQ c
> row5 myCF:myCQ c
> I am interested in unique values associated with myCF:myCQ (irrelevant
> columns omitted from example).
> Result: a, b, c
> Thanks,
> Mike