Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> importdirectory in accumulo


Copy link to this message
-
Re: importdirectory in accumulo
You will have to write your own InputFormat class which will parse your
file and pass records to your reducer.

-Eric
On Wed, Apr 3, 2013 at 2:29 PM, Aji Janis <[EMAIL PROTECTED]> wrote:

> Looking at the BulkIngestExample, it uses GenerateTestData and creates a
> .txt file which contians Key: Value pair and correct me if I am wrong but
> each new line is a new row right?
>
> I need to know how to have family and qualifiers also. In other words,
>
> 1) Do I set up a .txt file that can be converted into an Accumulo RF File
> using AccumuloFileOutputFormat  which can then be imported into my table?
>
> 2) if yes, what is the format of the .txt file.
>
>
>
>
> On Wed, Apr 3, 2013 at 2:19 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
>
>> Your data needs to be in the RFile format, and more importantly it needs
>> to be sorted.
>>
>> It's handy to use a Map/Reduce job to convert/sort your data.  See the
>> BulkIngestExample.
>>
>> -Eric
>>
>>
>> On Wed, Apr 3, 2013 at 2:15 PM, Aji Janis <[EMAIL PROTECTED]> wrote:
>>
>>> I have some data in a text file in the following format.
>>>
>>> rowid1 columnFamily1 colQualifier1 value
>>> rowid1 columnFamily1 colQualifier2 value
>>> rowid1 columnFamily2 colQualifier1 value
>>> rowid2 columnFamily1 colQualifier1 value
>>> rowid3 columnFamily1 colQualifier1 value
>>>
>>> I want to import this data into a table in accumulo. My end goal is to
>>> understand how to use the BulkImport feature in accumulo. I tried to login
>>> to the accumulo shell as root and then run:
>>>
>>> #table mytable
>>> #importdirectory /home/inputDir /home/failureDir true
>>>
>>> but it didn't work. My data file was saved as data.txt in
>>> /home/inputDir. I tried to create the dir/file structure in hdfs and linux
>>> but neither worked. When trying locally, it keeps complaining about
>>> failureDir not existing.
>>> ...
>>> java.io.FileNotFoundException: File does not exist: failures
>>>
>>> When trying with files on hdfs, I get no error on the console but the
>>> logger had the following messages:
>>> ...
>>> [tableOps.BulkImport] WARN : hdfs://node....//inputDir/data.txt does not
>>> have a valid extension, ignoring
>>>
>>> or,
>>>
>>> [tableOps.BulkImport] WARN : hdfs://node....//inputDir/data.txt is not a
>>> map file, ignoring
>>>
>>>
>>> Suggestions? Am I not setting up the job right? Thank you for help in
>>> advance.
>>>
>>>
>>> On Wed, Apr 3, 2013 at 2:04 PM, Aji Janis <[EMAIL PROTECTED]> wrote:
>>>
>>>> I have some data in a text file in the following format:
>>>>
>>>> rowid1 columnFamily colQualifier value
>>>> rowid1 columnFamily colQualifier value
>>>> rowid1 columnFamily colQualifier value
>>>>
>>>
>>>
>>
>