Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Re: Review Request 9276: Add support for pulling HBase columns with prefixes


Copy link to this message
-
Re: Review Request 9276: Add support for pulling HBase columns with prefixes

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review23388
-----------------------------------------------------------
Hi,

Regarding the discussion between yourself, Mark and I before, we weren't saying use a regex to decide if the incoming column is a wildcard. We are saying that it should be possible for someone to specify a regex in hbase.columns.mapping which we'd use to match. However, since we don't know the typing of the incoming column qualifiers (from hbase) this might be tough.

How about this... Today we require a very simple .*  to match all characters?  This is a valid regex so when we add regex support later we don't have to deal with backwards incompatibility issues. Basically what this would mean is:

1) Instead of col* matching everything that starts with col, col.* matches everything that starts with col.
2) Eliminate the regex matching against hbase.columns.mapping
3) Add a property which defaults to true named something like hbase.columns.mapping.regex.matching so users could turn this off if needed.
4) As you do today you'd use Bytes.startWith to do the match. Later we'd implement regex matching.

Brock

- Brock Noland
On Feb. 9, 2013, 9:56 p.m., Swarnim Kulkarni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
>
> (Updated Feb. 9, 2013, 9:56 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-3725
>     https://issues.apache.org/jira/browse/HIVE-3725
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
>
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*")
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
>
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
>
>
> Diffs
> -----
>
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282
>
> Diff: https://reviews.apache.org/r/9276/diff/
>
>
> Testing
> -------
>
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
>
>
> Thanks,
>
> Swarnim Kulkarni
>
>