|
|
-
Importing more than one column family in Hbase through Sqoop
anil gupta 2012-02-22, 19:31
Hi All,
I went through the User guide of Sqoop but i could not find anything for importing more than one columnfamily in HBase. Am i missing something? Is it planned for future release?
-- Thanks & Regards, Anil Gupta
-
Re: Importing more than one column family in Hbase through Sqoop
Kathleen Ting 2012-02-22, 19:51
Hi Anil,
Sqoop does not support multiple column families because HBase only permits atomic operations.
One workaround is to run two imports, specifying a different column family each time.
Regards, Kathleen
On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi All, > > I went through the User guide of Sqoop but i could not find anything for > importing more than one columnfamily in HBase. Am i missing something? Is > it planned for future release? > > -- > Thanks & Regards, > Anil Gupta >
-
Re: Importing more than one column family in Hbase through Sqoop
anil gupta 2012-02-22, 20:43
Hi Kathleen, Yes, that is always an option. Thanks for suggestion. I am a beginner at HBase. However, I was thinking of cutting down the time to dump the data from Database. If i do it twice(assuming i have 2 column families) then it increases the time of load the entire HBase table. AFAIK, Sqoop generates put statements to import data into HBase. If we can generate put statements for more than one column family. Would it violate the atomicity principle of HBase? I went through the atomicity section of http://hbase.apache.org/acid-semantics.html and I cant find anything which would stop sqoop loading more than one column family and Hbase bulk load also allows more than one column family although the approach of HBase bulk loading might be different from Sqoop. Could you provide me more insight? Sorry, if my question is dumb. Thanks, Anil Gupta On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote: > Hi Anil, > > Sqoop does not support multiple column families because HBase only permits > atomic operations. > > One workaround is to run two imports, specifying a different column family > each time. > > Regards, > Kathleen > > On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]>wrote: > >> Hi All, >> >> I went through the User guide of Sqoop but i could not find anything for >> importing more than one columnfamily in HBase. Am i missing something? Is >> it planned for future release? >> >> -- >> Thanks & Regards, >> Anil Gupta >> > > -- Thanks & Regards, Anil Gupta
-
Re: Importing more than one column family in Hbase through Sqoop
Kathleen Ting 2012-02-22, 22:02
Hi Anil - Good question and sorry for any confusion earlier. To be sure, because HBase permits atomic operations across a single column family only, Sqoop can not support multiple column families. Regards, Kathleen On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Kathleen, > > Yes, that is always an option. Thanks for suggestion. > > I am a beginner at HBase. However, I was thinking of cutting down the time > to dump the data from Database. If i do it twice(assuming i have 2 column > families) then it increases the time of load the entire HBase table. > AFAIK, Sqoop generates put statements to import data into HBase. If we can > generate put statements for more than one column family. Would it violate > the atomicity principle of HBase? I went through the atomicity section of > http://hbase.apache.org/acid-semantics.html and I cant find anything > which would stop sqoop loading more than one column family and Hbase bulk > load also allows more than one column family although the approach of > HBase bulk loading might be different from Sqoop. Could you provide me more > insight? Sorry, if my question is dumb. > > Thanks, > Anil Gupta > > > On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote: > >> Hi Anil, >> >> Sqoop does not support multiple column families because HBase only >> permits atomic operations. >> >> One workaround is to run two imports, specifying a different column >> family each time. >> >> Regards, >> Kathleen >> >> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]>wrote: >> >>> Hi All, >>> >>> I went through the User guide of Sqoop but i could not find anything for >>> importing more than one columnfamily in HBase. Am i missing something? Is >>> it planned for future release? >>> >>> -- >>> Thanks & Regards, >>> Anil Gupta >>> >> >> > > > -- > Thanks & Regards, > Anil Gupta >
-
Re: Importing more than one column family in Hbase through Sqoop
anil gupta 2012-02-22, 22:26
Hi Kathleen, I think my previous messages were misinterpreted, in previous message i was talking about generating separate put statement for separate columnfamily. I am having hard time understanding how this would violate the Hbase atomicity rule? For instance, on hbase shell my put statement would be like this for two column family: hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks' hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545' Similarly, this can be achieved by using java api of HBase which sqoop is using. Is the above scenario not possible in Hbase Java api? Thanks, Anil On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: > Hi Anil - > > Good question and sorry for any confusion earlier. To be sure, because > HBase permits atomic operations across a single column family only, Sqoop > can not support multiple column families. > > Regards, Kathleen > > On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <[EMAIL PROTECTED]> wrote: > >> Hi Kathleen, >> >> Yes, that is always an option. Thanks for suggestion. >> >> I am a beginner at HBase. However, I was thinking of cutting down the >> time to dump the data from Database. If i do it twice(assuming i have 2 >> column families) then it increases the time of load the entire HBase table. >> AFAIK, Sqoop generates put statements to import data into HBase. If we >> can generate put statements for more than one column family. Would it >> violate the atomicity principle of HBase? I went through the atomicity >> section of http://hbase.apache.org/acid-semantics.html and I cant find >> anything which would stop sqoop loading more than one column family and >> Hbase bulk load also allows more than one column family although the >> approach of HBase bulk loading might be different from Sqoop. Could you >> provide me more insight? Sorry, if my question is dumb. >> >> Thanks, >> Anil Gupta >> >> >> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote: >> >>> Hi Anil, >>> >>> Sqoop does not support multiple column families because HBase only >>> permits atomic operations. >>> >>> One workaround is to run two imports, specifying a different column >>> family each time. >>> >>> Regards, >>> Kathleen >>> >>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]>wrote: >>> >>>> Hi All, >>>> >>>> I went through the User guide of Sqoop but i could not find anything >>>> for importing more than one columnfamily in HBase. Am i missing something? >>>> Is it planned for future release? >>>> >>>> -- >>>> Thanks & Regards, >>>> Anil Gupta >>>> >>> >>> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> > > -- Thanks & Regards, Anil Gupta
-
Re: Importing more than one column family in Hbase through Sqoop
Kathleen Ting 2012-02-25, 05:36
Hi Anil, re: Is the above scenario not possible in Hbase Java api? I would suggest asking that on [EMAIL PROTECTED]. Thanks, Kathleen On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Kathleen, > > I think my previous messages were misinterpreted, in previous message i > was talking about generating separate put statement for separate > columnfamily. I am having hard time understanding how this would violate > the Hbase atomicity rule? > > For instance, on hbase shell my put statement would be like this for two > column family: > hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks' > hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545' > > Similarly, this can be achieved by using java api of HBase which sqoop is > using. Is the above scenario not possible in Hbase Java api? > > Thanks, > Anil > > > > On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: > >> Hi Anil - >> >> Good question and sorry for any confusion earlier. To be sure, because >> HBase permits atomic operations across a single column family only, Sqoop >> can not support multiple column families. >> >> Regards, Kathleen >> >> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <[EMAIL PROTECTED]>wrote: >> >>> Hi Kathleen, >>> >>> Yes, that is always an option. Thanks for suggestion. >>> >>> I am a beginner at HBase. However, I was thinking of cutting down the >>> time to dump the data from Database. If i do it twice(assuming i have 2 >>> column families) then it increases the time of load the entire HBase table. >>> AFAIK, Sqoop generates put statements to import data into HBase. If we >>> can generate put statements for more than one column family. Would it >>> violate the atomicity principle of HBase? I went through the atomicity >>> section of http://hbase.apache.org/acid-semantics.html and I cant find >>> anything which would stop sqoop loading more than one column family and >>> Hbase bulk load also allows more than one column family although the >>> approach of HBase bulk loading might be different from Sqoop. Could you >>> provide me more insight? Sorry, if my question is dumb. >>> >>> Thanks, >>> Anil Gupta >>> >>> >>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote: >>> >>>> Hi Anil, >>>> >>>> Sqoop does not support multiple column families because HBase only >>>> permits atomic operations. >>>> >>>> One workaround is to run two imports, specifying a different column >>>> family each time. >>>> >>>> Regards, >>>> Kathleen >>>> >>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi All, >>>>> >>>>> I went through the User guide of Sqoop but i could not find anything >>>>> for importing more than one columnfamily in HBase. Am i missing something? >>>>> Is it planned for future release? >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> Anil Gupta >>>>> >>>> >>>> >>> >>> >>> -- >>> Thanks & Regards, >>> Anil Gupta >>> >> >> > > > -- > Thanks & Regards, > Anil Gupta >
-
Re: Importing more than one column family in Hbase through Sqoop
anil gupta 2012-03-16, 22:09
Hi Kathleen, Sorry for the delayed reply as i started working on HBase rather than Sqoop. Here is an example code from the book "HBase:The Definitive Guide" which will show that it is possible to load data into more than one column family through java api which was exactly the point i was trying to make. Have a look at these two classes: https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.javahttps://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.javaPlease let me know if you have further questions. Thanks, Anil On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: > Hi Anil, > > re: Is the above scenario not possible in Hbase Java api? > I would suggest asking that on [EMAIL PROTECTED]. > > Thanks, > Kathleen > > On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <[EMAIL PROTECTED]> wrote: > >> Hi Kathleen, >> >> I think my previous messages were misinterpreted, in previous message i >> was talking about generating separate put statement for separate >> columnfamily. I am having hard time understanding how this would violate >> the Hbase atomicity rule? >> >> For instance, on hbase shell my put statement would be like this for two >> column family: >> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks' >> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545' >> >> Similarly, this can be achieved by using java api of HBase which sqoop is >> using. Is the above scenario not possible in Hbase Java api? >> >> Thanks, >> Anil >> >> >> >> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: >> >>> Hi Anil - >>> >>> Good question and sorry for any confusion earlier. To be sure, because >>> HBase permits atomic operations across a single column family only, Sqoop >>> can not support multiple column families. >>> >>> Regards, Kathleen >>> >>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <[EMAIL PROTECTED]>wrote: >>> >>>> Hi Kathleen, >>>> >>>> Yes, that is always an option. Thanks for suggestion. >>>> >>>> I am a beginner at HBase. However, I was thinking of cutting down the >>>> time to dump the data from Database. If i do it twice(assuming i have 2 >>>> column families) then it increases the time of load the entire HBase table. >>>> AFAIK, Sqoop generates put statements to import data into HBase. If we >>>> can generate put statements for more than one column family. Would it >>>> violate the atomicity principle of HBase? I went through the atomicity >>>> section of http://hbase.apache.org/acid-semantics.html and I cant find >>>> anything which would stop sqoop loading more than one column family and >>>> Hbase bulk load also allows more than one column family although the >>>> approach of HBase bulk loading might be different from Sqoop. Could you >>>> provide me more insight? Sorry, if my question is dumb. >>>> >>>> Thanks, >>>> Anil Gupta >>>> >>>> >>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi Anil, >>>>> >>>>> Sqoop does not support multiple column families because HBase only >>>>> permits atomic operations. >>>>> >>>>> One workaround is to run two imports, specifying a different column >>>>> family each time. >>>>> >>>>> Regards, >>>>> Kathleen >>>>> >>>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> I went through the User guide of Sqoop but i could not find anything >>>>>> for importing more than one columnfamily in HBase. Am i missing something? >>>>>> Is it planned for future release? >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> Anil Gupta >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> Anil Gupta >>>> >>> >>> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> > > -- Thanks & Regards, Anil Gupta
-
Re: Importing more than one column family in Hbase through Sqoop
Kathleen Ting 2012-03-19, 19:58
Anil - Understood. As it happens, the HBase release that supported atomicity came after the Sqoop release that included HBase integration, hence the limitation. Please go ahead and file a Sqoop JIRA requesting that Sqoop needs a CLI way to let the user specify multiple column families. Regards, Kathleen On Fri, Mar 16, 2012 at 3:09 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Kathleen, > > Sorry for the delayed reply as i started working on HBase rather than > Sqoop. > Here is an example code from the book "HBase:The Definitive Guide" which > will show that it is possible to load data into more than one column family > through java api which was exactly the point i was trying to make. > > Have a look at these two classes: > > https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.java> > https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.java> > Please let me know if you have further questions. > > Thanks, > Anil > > On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: > >> Hi Anil, >> >> re: Is the above scenario not possible in Hbase Java api? >> I would suggest asking that on [EMAIL PROTECTED]. >> >> Thanks, >> Kathleen >> >> On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <[EMAIL PROTECTED]> wrote: >> >>> Hi Kathleen, >>> >>> I think my previous messages were misinterpreted, in previous message i >>> was talking about generating separate put statement for separate >>> columnfamily. I am having hard time understanding how this would violate >>> the Hbase atomicity rule? >>> >>> For instance, on hbase shell my put statement would be like this for two >>> column family: >>> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks' >>> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545' >>> >>> Similarly, this can be achieved by using java api of HBase which sqoop >>> is using. Is the above scenario not possible in Hbase Java api? >>> >>> Thanks, >>> Anil >>> >>> >>> >>> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: >>> >>>> Hi Anil - >>>> >>>> Good question and sorry for any confusion earlier. To be sure, because >>>> HBase permits atomic operations across a single column family only, Sqoop >>>> can not support multiple column families. >>>> >>>> Regards, Kathleen >>>> >>>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi Kathleen, >>>>> >>>>> Yes, that is always an option. Thanks for suggestion. >>>>> >>>>> I am a beginner at HBase. However, I was thinking of cutting down the >>>>> time to dump the data from Database. If i do it twice(assuming i have 2 >>>>> column families) then it increases the time of load the entire HBase table. >>>>> AFAIK, Sqoop generates put statements to import data into HBase. If we >>>>> can generate put statements for more than one column family. Would it >>>>> violate the atomicity principle of HBase? I went through the atomicity >>>>> section of http://hbase.apache.org/acid-semantics.html and I cant >>>>> find anything which would stop sqoop loading more than one column family >>>>> and Hbase bulk load also allows more than one column family although the >>>>> approach of HBase bulk loading might be different from Sqoop. Could you >>>>> provide me more insight? Sorry, if my question is dumb. >>>>> >>>>> Thanks, >>>>> Anil Gupta >>>>> >>>>> >>>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <[EMAIL PROTECTED] >>>>> > wrote: >>>>> >>>>>> Hi Anil, >>>>>> >>>>>> Sqoop does not support multiple column families because HBase only >>>>>> permits atomic operations. >>>>>> >>>>>> One workaround is to run two imports, specifying a different column >>>>>> family each time. >>>>>> >>>>>> Regards, >>>>>> Kathleen >>>>>> >>>>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I went through the User guide of Sqoop but i could not find anything
-
Re: Importing more than one column family in Hbase through Sqoop
anil gupta 2012-03-29, 22:50
Hi Kathleen, Here is the jira i filed for this stuff: https://issues.apache.org/jira/browse/SQOOP-472Thanks, Anil Gupta On Mon, Mar 19, 2012 at 12:58 PM, Kathleen Ting <[EMAIL PROTECTED]> wrote: > Anil - > > Understood. As it happens, the HBase release that supported atomicity came > after the Sqoop release that included HBase integration, hence the > limitation. > > Please go ahead and file a Sqoop JIRA requesting that Sqoop needs a CLI > way to let the user specify multiple column families. > > Regards, Kathleen > > On Fri, Mar 16, 2012 at 3:09 PM, anil gupta <[EMAIL PROTECTED]> wrote: > >> Hi Kathleen, >> >> Sorry for the delayed reply as i started working on HBase rather than >> Sqoop. >> Here is an example code from the book "HBase:The Definitive Guide" which >> will show that it is possible to load data into more than one column family >> through java api which was exactly the point i was trying to make. >> >> Have a look at these two classes: >> >> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.java>> >> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.java>> >> Please let me know if you have further questions. >> >> Thanks, >> Anil >> >> On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: >> >>> Hi Anil, >>> >>> re: Is the above scenario not possible in Hbase Java api? >>> I would suggest asking that on [EMAIL PROTECTED]. >>> >>> Thanks, >>> Kathleen >>> >>> On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <[EMAIL PROTECTED]>wrote: >>> >>>> Hi Kathleen, >>>> >>>> I think my previous messages were misinterpreted, in previous message i >>>> was talking about generating separate put statement for separate >>>> columnfamily. I am having hard time understanding how this would violate >>>> the Hbase atomicity rule? >>>> >>>> For instance, on hbase shell my put statement would be like this for >>>> two column family: >>>> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks' >>>> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545' >>>> >>>> Similarly, this can be achieved by using java api of HBase which sqoop >>>> is using. Is the above scenario not possible in Hbase Java api? >>>> >>>> Thanks, >>>> Anil >>>> >>>> >>>> >>>> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi Anil - >>>>> >>>>> Good question and sorry for any confusion earlier. To be sure, because >>>>> HBase permits atomic operations across a single column family only, Sqoop >>>>> can not support multiple column families. >>>>> >>>>> Regards, Kathleen >>>>> >>>>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Hi Kathleen, >>>>>> >>>>>> Yes, that is always an option. Thanks for suggestion. >>>>>> >>>>>> I am a beginner at HBase. However, I was thinking of cutting down the >>>>>> time to dump the data from Database. If i do it twice(assuming i have 2 >>>>>> column families) then it increases the time of load the entire HBase table. >>>>>> AFAIK, Sqoop generates put statements to import data into HBase. If >>>>>> we can generate put statements for more than one column family. Would it >>>>>> violate the atomicity principle of HBase? I went through the atomicity >>>>>> section of http://hbase.apache.org/acid-semantics.html and I cant >>>>>> find anything which would stop sqoop loading more than one column family >>>>>> and Hbase bulk load also allows more than one column family although the >>>>>> approach of HBase bulk loading might be different from Sqoop. Could you >>>>>> provide me more insight? Sorry, if my question is dumb. >>>>>> >>>>>> Thanks, >>>>>> Anil Gupta >>>>>> >>>>>> >>>>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting < >>>>>> [EMAIL PROTECTED]> wrote: >>>>>> >>>>>>> Hi Anil, >>>>>>> >>>>>>> Sqoop does not support multiple column families because HBase only >>>>>>> permits atomic operations. >>>>>>> >>>>>>> One workaround is to run two imports, specifying a different column Thanks & Regards, Anil Gupta
|
|