Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> Review Request: Add support for pulling HBase columns with prefixes


+
Swarnim Kulkarni 2013-02-03, 01:02
+
Swarnim Kulkarni 2013-02-03, 01:04
+
Mark Grover 2013-02-05, 03:43
+
Swarnim Kulkarni 2013-02-09, 15:21
+
Swarnim Kulkarni 2013-02-10, 00:44
+
Brock Noland 2013-02-09, 16:05
Copy link to this message
-
Re: Review Request: Add support for pulling HBase columns with prefixes


> On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 192
> > <https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192>
> >
> >     This seems like a limited case of pattern matching. Swarnim, any way we can support generic regex matching instead?
>
> Swarnim Kulkarni wrote:
>     Mark, in this case I specifically wanted to only allow strings that end with exactly the character "*" and using String#endsWith seemed more simpler and readable than a regex. Do you still want me to replace this with a regex matching?
>
> Brock Noland wrote:
>     I think the issue is that this would make it difficult to implement enhanced pattern matching later. Implementing it now, you'd only need to specify:
>    
>     col.*
>    
>     in the table configuration. Now the issue would be detecting if the particular column was a regex pattern. Because #, comma, and : are used as separators that would exclude those characters from being used.

Thanks Brock. Makes sense. To be sure I am understanding you right, the change now would be just to replace the "parts[1].endsWith(*)" with something more regexy that would still imply that the string ends with "*". Correct?
- Swarnim
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
-----------------------------------------------------------
On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9276/
> -----------------------------------------------------------
>
> (Updated Feb. 3, 2013, 1:04 a.m.)
>
>
> Review request for hive.
>
>
> Description
> -------
>
> Added support for pulling hbase columns just by providing prefixes and a wildcard. So a query now could look something like this:
>
> CREATE EXTERNAL TABLE hive_hbase_test
> ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,fam1:col*")
> TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
>
> This would pull in all columns under column family "fam1" which start with "col". This gives a little more flexibility over pull all columns format.
>
>
> This addresses bug HIVE-3725.
>     https://issues.apache.org/jira/browse/HIVE-3725
>
>
> Diffs
> -----
>
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java a8ba9d9
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52
>   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java e821282
>
> Diff: https://reviews.apache.org/r/9276/diff/
>
>
> Testing
> -------
>
> Added unit tests to demonstrate the new functionality. Also made sure that all existing unit tests passed.
>
>
> Thanks,
>
> Swarnim Kulkarni
>
>

+
Mark Grover 2013-02-09, 16:38
+
Swarnim Kulkarni 2013-02-09, 21:56
+
Eric Hanson 2013-04-30, 21:35