Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> [ANN] Hive-protobuf support


Copy link to this message
-
Re: [ANN] Hive-protobuf support
Hi Edward,

This project looks really good.

Internally, we also have been working on similar changes. Specifically,
enhancing the existing HIve/HBase Integration to support protobufs/thrifts
stored in HBase. Because of the need to specify explicit columns mapping
and number of issues faced [1] with getting the existing
ProtocolBuffersObjectInspector working with the latest protobuf 2.4.1, I
decided to write totally new ObjectInspectors to cleanly deserialize
protobufs and thrifts that use the provided reflections API to perform
deserialization and field extraction.

In short, some of the enhancements are:

1. Support thrift/protobuf stored in HBase using the new ObjectInspectors.
2. Auto generate the columns and column types using the provided
deserializer class by translating them into nested structs. (HIVE-3211)

Some of this stuff is still in development/testing phase. Once that is
done, I can have a patch for this enhancement up for review.

[1]
http://mail-archives.apache.org/mod_mbox/hive-user/201205.mbox/%3CCAENxBwxaSOq1=0u+keaj6NG_s8Zh6=rZvLZ4P2YwGe-UQ+[EMAIL PROTECTED]%3E

Thanks,
On Sat, Jul 14, 2012 at 10:18 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> Hello all,
>
> My employer, m6d.com, has given the thumbs up to open source our
> latest hive tool, hive-protobuf. We created this because we work with
> protobuf formats often and wanted to be able to directly log an query
> this types without writing one-off User Defined Functions or Input
> Formats.
>
> https://github.com/edwardcapriolo/hive-protobuf
>
> Hive-protobuf is much like the new avro support and the already
> existing thrift support. Here is how it works:
>
> if you have a sequence file with a serialized protobuf in the key and
> a serialized protobuf in the value, a table can be created that
> describes the data to hive. The table needs only be configured with
> the protobuf generated class name for the key and value and it turns
> the nested classes into nested structs.
>
> We eventually will migrate the project into core hive but we want to
> let it incubate in github for a time. (For example there is no support
> for union types at the moment, maybe other kinks or tunes). Please
> checkout the project and send pull requests if you have patches.
>
> Thank you,
> Edward
>

--
Swarnim
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB