Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Difference between InsertWithBatchWriter and InsertWithOutputFormat


Copy link to this message
-
Re: Difference between InsertWithBatchWriter and InsertWithOutputFormat
Huanchen,

The AccumuloOutputFormat just passes along the connection information (i.e. username, password, instance, zookeepers) so that an Accumulo connector can be created in each output worker (that is, each mapper or reducer). You could do this on your own by passing the connection information around in the Configuration() and creating the BatchWriter in the mappers (map-only job) or the reducer and then use your HDFS output format to emit the data elsewhere.

I have not looked at these examples but I'm assuming they are doing the same thing? Though I haven't tried this myself, I can't see why it wouldn't work. When having 2 output endpoints, you will most likely want to think about a strategy to deal with a successful Accumulo write but a failure in writing to HDFS- if data consistency is something you need to guarantee.
Corey

On Oct 16, 2012, at 10:48 PM, Huanchen Zhang wrote:

> Hello,  Corey
>
> Thank you for your answer.
>
> Can I use InsertWithBatchWriter for this task ? I mean, use context.write to write to hdfs, use batchwriter.addMutation to write to accumulo.
>
> Huanchen
>
> On Oct 16, 2012, at 10:25 PM, Corey Nolet wrote:
>
>> You can extend the output format to write to both and have the resulting record writer underneath write to the correct endpoint depending on the items submitted from the job.
>>
>>
>>
>>
>>
>> On Oct 16, 2012, at 10:16 PM, Huanchen Zhang wrote:
>>
>>> Hello,
>>>
>>> Hese I have a mapreduce job which needs to write to accumulo. I checked the examples. It seems there are two different ways to write to accumulo, one is InsertWithBatchWriter, one is InsertWithOutputFormat.
>>>
>>> So, what is the difference of them ? Which one should I choose ?
>>>
>>> I actually need to write to accumulo and hdfs in the same job. I seems InsertWithOutputFormat cannot do this, because it needs to set the output format as "AccumuloOutputFormat.class", and can only write to accumulo in one job, right ?
>>>
>>> Thank you.
>>>
>>> Best,
>>> Huanchen
>>
>