Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Importing more than one column family in Hbase through Sqoop


Copy link to this message
-
Re: Importing more than one column family in Hbase through Sqoop
Anil -

Understood. As it happens, the HBase release that supported atomicity came
after the Sqoop release that included HBase integration, hence the
limitation.

Please go ahead and file a Sqoop JIRA requesting that Sqoop needs a CLI way
to let the user specify multiple column families.

Regards, Kathleen

On Fri, Mar 16, 2012 at 3:09 PM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Kathleen,
>
> Sorry for the delayed reply as i started working on HBase rather than
> Sqoop.
> Here is an example code from the book "HBase:The Definitive Guide" which
> will show that it is possible to load data into more than one column family
> through java api which was exactly the point i was trying to make.
>
> Have a look at these two classes:
>
> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.java
>
> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.java
>
> Please let me know if you have further questions.
>
> Thanks,
> Anil
>
> On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote:
>
>> Hi Anil,
>>
>> re: Is the above scenario not possible in Hbase Java api?
>> I would suggest asking that on [EMAIL PROTECTED].
>>
>> Thanks,
>> Kathleen
>>
>> On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Kathleen,
>>>
>>> I think my previous messages were misinterpreted, in previous message i
>>> was talking about generating separate put statement for separate
>>> columnfamily. I am having hard time understanding how this would violate
>>> the Hbase atomicity rule?
>>>
>>> For instance, on hbase shell my put statement would be like this for two
>>> column family:
>>> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks'
>>> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545'
>>>
>>> Similarly, this can be achieved by using java api of HBase which sqoop
>>> is using. Is the above scenario not possible in Hbase Java api?
>>>
>>> Thanks,
>>> Anil
>>>
>>>
>>>
>>> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi Anil -
>>>>
>>>> Good question and sorry for any confusion earlier. To be sure, because
>>>> HBase permits atomic operations across a single column family only, Sqoop
>>>> can not support multiple column families.
>>>>
>>>> Regards, Kathleen
>>>>
>>>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi Kathleen,
>>>>>
>>>>> Yes, that is always an option. Thanks for suggestion.
>>>>>
>>>>> I am a beginner at HBase. However, I was thinking of cutting down the
>>>>> time to dump the data from Database. If i do it twice(assuming i have 2
>>>>> column families) then it increases the time of load the entire HBase table.
>>>>> AFAIK, Sqoop generates put statements to import data into HBase. If we
>>>>> can generate put statements for more than one column family. Would it
>>>>> violate the atomicity principle of HBase? I went through the atomicity
>>>>> section of http://hbase.apache.org/acid-semantics.html and I cant
>>>>> find anything which would stop sqoop loading more than one column family
>>>>> and Hbase bulk load also allows more than one column family although the
>>>>> approach of  HBase bulk loading might be different from Sqoop. Could you
>>>>> provide me more insight?  Sorry, if my question is dumb.
>>>>>
>>>>> Thanks,
>>>>> Anil Gupta
>>>>>
>>>>>
>>>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <[EMAIL PROTECTED]
>>>>> > wrote:
>>>>>
>>>>>> Hi Anil,
>>>>>>
>>>>>> Sqoop does not support multiple column families because HBase only
>>>>>> permits atomic operations.
>>>>>>
>>>>>> One workaround is to run two imports, specifying a different column
>>>>>> family each time.
>>>>>>
>>>>>> Regards,
>>>>>> Kathleen
>>>>>>
>>>>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I went through the User guide of Sqoop but i could not find anything
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB