Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> List of unique qualifiers [SEC=UNOFFICIAL]


Copy link to this message
-
RE: List of unique qualifiers [SEC=UNOFFICIAL]
UNOFFICIAL

Thanks Keith.  I've run a simple mr job based on the UniqueColumns example, but due to the size of the table this is taking a very long time.  Is it possible to pre-filter the data that goes to the MR job based on family, eg only run the MR job on columns with a specific column family of 'cityofbirth'?  I am currently going through every column in the table and checking the column family in the mapper ... slow.

________________________________
From: Keith Turner [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 15 January 2014 12:06
To: [EMAIL PROTECTED]
Subject: Re: List of unique qualifiers [SEC=UNOFFICIAL]
On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

UNOFFICIAL

Just for simplicity, this is a one of request for managment so I was hoping to just scan via the shell and output to a file.

If I need to do it via a mr job I can do it that way and would be keen to hear any suggestions.

You could modify the following example in 1.4 to suit your needs.

src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/UniqueColumns.java
________________________________
From: David Medinets [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Wednesday, 15 January 2014 09:36
To: accumulo-user
Subject: Re: List of unique qualifiers [SEC=UNOFFICIAL]

Why the restriction to the shell environment? A nice map-reduce job would be ideal for this task.
On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

UNOFFICIAL

Hi,

I need to extract a list of unique qualifier values on a table from the Accumulo shell.  For every column there is a column family that identifies a specific qualifer, eg 'cityofbirth'.  I would like to get a unique list of all cities that are a listed in the qualifier against 'cityofbirth' for all rows.

eg, If I had a table with

Rowid                Family            Qual
123                   cityofbirth         LosAngeles
133                   cityofbirth         Brisbane
222                   cityofbirth         London
124                   cityofbirth         London
124                   cityofbirth         London

I want a list that is just;
LosAngeles
London
Brisbane

Any suggestions on how to achieve this from the shell would great.

Thanks in advance.
Matt