Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - List of unique qualifiers [SEC=UNOFFICIAL]


Copy link to this message
-
Re: List of unique qualifiers [SEC=UNOFFICIAL]
David Medinets 2014-01-16, 21:54
Clone the table, take the cloned table offline, use it for the map-reduce
job, then delete it. All of this work can be done through the Java API
which is nice if you'll be running the job more than once.
On Wed, Jan 15, 2014 at 8:27 PM, Corey Nolet <[EMAIL PROTECTED]> wrote:

> Matt,
>
> This should help:
>
> Collection<Pair<Text,Text>> cols = Collections.singleton(new
> Pair<Text,Text>(new Text("cityOfBirth"), null));
> AccumuloInputFormat.fetchColumns(job, cols);
>
>
>
> On Wed, Jan 15, 2014 at 7:29 PM, Dickson, Matt MR <
> [EMAIL PROTECTED]> wrote:
>
>>  *UNOFFICIAL*
>> Thanks Keith.  I've run a simple mr job based on the UniqueColumns
>> example, but due to the size of the table this is taking a very long time.
>> Is it possible to pre-filter the data that goes to the MR job based on
>> family, eg only run the MR job on columns with a specific column family of
>> 'cityofbirth'?  I am currently going through every column in the table and
>> checking the column family in the mapper ... slow.
>>
>>
>>
>>  ------------------------------
>> *From:* Keith Turner [mailto:[EMAIL PROTECTED]]
>> *Sent:* Wednesday, 15 January 2014 12:06
>> *To:* [EMAIL PROTECTED]
>>
>> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]
>>
>>
>>
>>
>> On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR <
>> [EMAIL PROTECTED]> wrote:
>>
>>>  *UNOFFICIAL*
>>> Just for simplicity, this is a one of request for managment so I was
>>> hoping to just scan via the shell and output to a file.
>>>
>>> If I need to do it via a mr job I can do it that way and would be keen
>>> to hear any suggestions.
>>>
>>
>> You could modify the following example in 1.4 to suit your needs.
>>
>>
>> src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/UniqueColumns.java
>>
>>
>>>
>>>  ------------------------------
>>> *From:* David Medinets [mailto:[EMAIL PROTECTED]]
>>> *Sent:* Wednesday, 15 January 2014 09:36
>>> *To:* accumulo-user
>>> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]
>>>
>>>   Why the restriction to the shell environment? A nice map-reduce job
>>> would be ideal for this task.
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>>  *UNOFFICIAL*
>>>> Hi,
>>>>
>>>> I need to extract a list of unique qualifier values on a table from the
>>>> Accumulo shell.  For every column there is a column family that identifies
>>>> a specific qualifer, eg 'cityofbirth'.  I would like to get a unique list
>>>> of all cities that are a listed in the qualifier against 'cityofbirth' for
>>>> all rows.
>>>>
>>>> eg, If I had a table with
>>>>
>>>> Rowid                Family            Qual
>>>> 123                   cityofbirth         LosAngeles
>>>> 133                   cityofbirth         Brisbane
>>>> 222                   cityofbirth         London
>>>> 124                   cityofbirth         London
>>>> 124                   cityofbirth         London
>>>>
>>>> I want a list that is just;
>>>> LosAngeles
>>>> London
>>>> Brisbane
>>>>
>>>> Any suggestions on how to achieve this from the shell would great.
>>>>
>>>> Thanks in advance.
>>>> Matt
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>