|
David Koch
2012-12-06, 18:56
kulkarni.swarnim@...)
2012-12-06, 20:10
David Koch
2012-12-06, 20:23
David Koch
2012-12-10, 00:03
kulkarni.swarnim@...)
2012-12-10, 00:52
|
-
Mapping existing HBase table with many columns to Hive.David Koch 2012-12-06, 18:56
Hello,
How can I map an HBase table with the following layout to Hive using the "CREATE EXTERNAL TABLE" command from shell (or another programmatic way): The HBase table's layout is as follows: Rowkey=16 bytes, a UUID that had the "-" removed, and the 32hex chars converted into two 8byte longs. Columns (qualifiers): timestamps, i.e the bytes of a long which were converted using Hadoop's Bytes.toBytes(long). There can be many of those in a single row. Values: The bytes of a Java string. I am unsure of which datatypes to use. I am pretty sure there is no way I can sensible map the row key to anything other than "binary" but maybe the columns - which are longs and the values which are strings can be mapped to their according Hive datatypes. I include an extract of what a row looks like in HBase shell below: Thank you, /David hbase(main):009:0> scan "hits" ROW COLUMN+CELL \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967, value=testtest
-
Re: Mapping existing HBase table with many columns to Hive.kulkarni.swarnim@...) 2012-12-06, 20:10
Hi David,
First of all, you columns are not "long". They are binary as well. Currently as hive stands, there is no support for binary qualifiers. However, I recently submitted a patch for that[1]. Feel free to give it a shot and let me know if you see any issues. With that patch, you can directly give your qualifiers to hive as they look here ( \x00\x00\x01;2\xE6Q\x06). Until then, the only option you have is to use a map to map all your columns under the column family "t". An example to do that would be: CREATE EXTERNAL TABLE hbase_table_1(key int, value map<string,string>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,t:") TBLPROPERTIES("hbase.table.name" = "some_existing_table"); Also as far as your key goes, it is a composite key. There is also an existing patch for the support of that here[2]. Hope that helps. [1] https://issues.apache.org/jira/browse/HIVE-3553 [2] https://issues.apache.org/jira/browse/HIVE-2599 On Thu, Dec 6, 2012 at 12:56 PM, David Koch <[EMAIL PROTECTED]> wrote: > Hello, > > How can I map an HBase table with the following layout to Hive using the > "CREATE EXTERNAL TABLE" command from shell (or another programmatic way): > > The HBase table's layout is as follows: > Rowkey=16 bytes, a UUID that had the "-" removed, and the 32hex chars > converted into two 8byte longs. > Columns (qualifiers): timestamps, i.e the bytes of a long which were > converted using Hadoop's Bytes.toBytes(long). There can be many of those in > a single row. > Values: The bytes of a Java string. > > I am unsure of which datatypes to use. I am pretty sure there is no way I > can sensible map the row key to anything other than "binary" but maybe the > columns - which are longs and the values which are strings can be mapped to > their according Hive datatypes. > > I include an extract of what a row looks like in HBase shell below: > > Thank you, > > /David > > hbase(main):009:0> scan "hits" > ROW > COLUMN+CELL > > \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC > column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha > \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC > column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967, > value=testtest > -- Swarnim
-
Re: Mapping existing HBase table with many columns to Hive.David Koch 2012-12-06, 20:23
Hello Swarnim,
Thank you for your answer. I will try the options you pointed out. /David On Thu, Dec 6, 2012 at 9:10 PM, [EMAIL PROTECTED] < [EMAIL PROTECTED]> wrote: > map
-
Re: Mapping existing HBase table with many columns to Hive.David Koch 2012-12-10, 00:03
Hello,
I tried the shell command which Swarnim kindly provided and it allows me to map an existing HBase table into Hive. However, since my qualifiers are long but map only accepts string as a key, the result is garbled. Even with the suggested patch which allows binary keys, the resulting datatype in Hive would not be long but binary, making it hard to query from shell. It seems there is no API for now, right? Currently, is there any way to map HBase byte[] to Hive datatypes? The assumption is, that all byte[] were generated using Hadoop's Byte.toBytes(<type>) method and that either all row keys, qualifiers and values share the same data type respectively (for example: row keys are ints, qualifiers are longs and values are strings). Thank you, /David On Thu, Dec 6, 2012 at 9:23 PM, David Koch <[EMAIL PROTECTED]> wrote: > Hello Swarnim, > > Thank you for your answer. I will try the options you pointed out. > > /David > > > On Thu, Dec 6, 2012 at 9:10 PM, [EMAIL PROTECTED] < > [EMAIL PROTECTED]> wrote: > >> map > > >
-
Re: Mapping existing HBase table with many columns to Hive.kulkarni.swarnim@...) 2012-12-10, 00:52
The data type corresponds to the type of data that the qualifiers holds and
not the data type of the qualifier. So, with the above query, for the column value map<string,string>, the type is map<string,string> which means that your qualifiers mapped to keys are of type "string" and underlying values of type "string". The patch will just allow hive to read data which is stored under a binary qualifier like you have and once the mapping is done, treat the binary qualifier as a "string" as well since that is the only type hive allows for map keys. Hope that helps. On Sun, Dec 9, 2012 at 6:03 PM, David Koch <[EMAIL PROTECTED]> wrote: > Hello, > > I tried the shell command which Swarnim kindly provided and it allows me > to map an existing HBase table into Hive. However, since my qualifiers are > long but map only accepts string as a key, the result is garbled. Even with > the suggested patch which allows binary keys, the resulting datatype in > Hive would not be long but binary, making it hard to query from shell. It > seems there is no API for now, right? > > Currently, is there any way to map HBase byte[] to Hive datatypes? > > The assumption is, that all byte[] were generated using Hadoop's > Byte.toBytes(<type>) method and that either all row keys, qualifiers and > values share the same data type respectively (for example: row keys are > ints, qualifiers are longs and values are strings). > > Thank you, > > /David > > > On Thu, Dec 6, 2012 at 9:23 PM, David Koch <[EMAIL PROTECTED]> wrote: > >> Hello Swarnim, >> >> Thank you for your answer. I will try the options you pointed out. >> >> /David >> >> >> On Thu, Dec 6, 2012 at 9:10 PM, [EMAIL PROTECTED] < >> [EMAIL PROTECTED]> wrote: >> >>> map >> >> >> > -- Swarnim |