Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Accumulo / HBase migration


Copy link to this message
-
Re: Accumulo / HBase migration
We could also just add a transformation from HFileReader ->
LocalityGroupReader, since I think HBase's storage model (forgive me if
there's a better term) maps pretty well to that.
On Tue, Jul 9, 2013 at 2:20 PM, <[EMAIL PROTECTED]> wrote:

> I believe that Brian Loss committed code in 1.5 for a column visibility
> correction iterator or something that you could use to do this. You could
> use that and compact the table after the import.
>
> ------------------------------
> *From: *"Donald Miner" <[EMAIL PROTECTED]>
> *To: *[EMAIL PROTECTED]
> *Sent: *Tuesday, July 9, 2013 1:36:20 PM
> *Subject: *Re: Accumulo / HBase migration
>
>
> I did think about this. My naive answer is just by default ignore
> visibilities (meaning make everything public or make everything the same
> visibility). It would be interesting however to be able to insert a chunk
> of code that inferred the visibility from the record itself. That is, you'd
> have a function you can pass in that returns a ColumnVisibility and takes
> in a value/rowkey/etc.
>
>
> On Tue, Jul 9, 2013 at 1:31 PM, Kurt Christensen <[EMAIL PROTECTED]>wrote:
>
>>
>> I don't have a response to your question, but it seems to me that the big
>> capability difference is visibility field. When doing bulk translations
>> like this, do you just fill visibility with some default value?
>>
>> -- Kurt
>>
>>
>> On 7/9/13 1:26 PM, Donald Miner wrote:
>>
>>>  Has anyone developed tools to migrate data from an existing HBase
>>> implementation to Accumulo? My team has done it "manually" in the past but
>>> it seems like it would be reasonable to write a process that handled the
>>> steps in a more automated fashion.
>>>
>>> Here are a few sample designs I've kicked around:
>>>
>>> HBase -> mapreduce -> mappers bulk write to accumulo -> Accumulo
>>> or
>>> HBase -> mapreduce -> tfiles via AccumuloFileOutputFormat -> Accumulo
>>> bulk load -> Accumulo
>>> or
>>> HBase -> bulk export -> map-only mapreduce to translate hfiles into
>>> tfiles (how hard would this be??) -> Accumulo bulk load -> Accumulo
>>>
>>> I guess this could be extended to go the other way around (and also
>>> include Cassandra perhaps).
>>>
>>> Maybe we'll start working on this soon. I just wanted to kick the idea
>>> out there to see if it's been done before or if anyone has some gut
>>> reactions to the process.
>>>
>>> -Don
>>>
>>> This communication is the property of ClearEdge IT Solutions, LLC and
>>> may contain confidential and/or privileged information. Any review,
>>> retransmissions, dissemination or other use of or taking of any action in
>>> reliance upon this information by persons or entities other than the
>>> intended recipient is prohibited. If you receive this communication in
>>> error, please immediately notify the sender and destroy all copies of the
>>> communication and any attachments.
>>>
>>
>> --
>>
>> Kurt Christensen
>> P.O. Box 811
>> Westminster, MD 21158-0811
>>
>> ------------------------------**------------------------------**
>> ------------
>> I'm not really a trouble maker. I just play one on TV.
>>
>
>
>
> --
>   *
> *Donald Miner
> Chief Technology Officer
> ClearEdge IT Solutions, LLC
> Cell: 443 799 7807
> www.clearedgeit.com
>
> This communication is the property of ClearEdge IT Solutions, LLC and may
> contain confidential and/or privileged information. Any review,
> retransmissions, dissemination or other use of or taking of any action in
> reliance upon this information by persons or entities other than the
> intended recipient is prohibited. If you receive this communication in
> error, please immediately notify the sender and destroy all copies of the
> communication and any attachments.
>