Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> List of unique qualifiers [SEC=UNOFFICIAL]


+
Dickson, Matt MR 2014-01-14, 22:30
+
David Medinets 2014-01-14, 22:36
+
Dickson, Matt MR 2014-01-14, 23:06
+
Keith Turner 2014-01-15, 01:05
Copy link to this message
-
RE: List of unique qualifiers [SEC=UNOFFICIAL]
UNOFFICIAL

Thanks Keith.  I've run a simple mr job based on the UniqueColumns example, but due to the size of the table this is taking a very long time.  Is it possible to pre-filter the data that goes to the MR job based on family, eg only run the MR job on columns with a specific column family of 'cityofbirth'?  I am currently going through every column in the table and checking the column family in the mapper ... slow.

________________________________
From: Keith Turner [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 15 January 2014 12:06
To: [EMAIL PROTECTED]
Subject: Re: List of unique qualifiers [SEC=UNOFFICIAL]
On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

UNOFFICIAL

Just for simplicity, this is a one of request for managment so I was hoping to just scan via the shell and output to a file.

If I need to do it via a mr job I can do it that way and would be keen to hear any suggestions.

You could modify the following example in 1.4 to suit your needs.

src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/UniqueColumns.java
________________________________
From: David Medinets [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Wednesday, 15 January 2014 09:36
To: accumulo-user
Subject: Re: List of unique qualifiers [SEC=UNOFFICIAL]

Why the restriction to the shell environment? A nice map-reduce job would be ideal for this task.
On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

UNOFFICIAL

Hi,

I need to extract a list of unique qualifier values on a table from the Accumulo shell.  For every column there is a column family that identifies a specific qualifer, eg 'cityofbirth'.  I would like to get a unique list of all cities that are a listed in the qualifier against 'cityofbirth' for all rows.

eg, If I had a table with

Rowid                Family            Qual
123                   cityofbirth         LosAngeles
133                   cityofbirth         Brisbane
222                   cityofbirth         London
124                   cityofbirth         London
124                   cityofbirth         London

I want a list that is just;
LosAngeles
London
Brisbane

Any suggestions on how to achieve this from the shell would great.

Thanks in advance.
Matt

+
Corey Nolet 2014-01-16, 01:27
+
David Medinets 2014-01-16, 21:54
+
Sean Busbey 2014-01-16, 22:26
+
Ott, Charles H. 2014-01-17, 14:31
+
Eric Newton 2014-01-16, 01:27
+
Josh Elser 2014-01-14, 23:11