Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> BINARY column type


+
Connell, Chuck 2012-12-01, 16:50
Copy link to this message
-
Re: BINARY column type
Hi Chuck -

I've used binary columns with Newlines in the data. I used RCFile format
for my storage method. Works great so far. Whether or not this is "the" way
to get data in, I use hexed data (my transform script outputs hex encoded)
and the final insert into the table gets a unhex(sourcedata).  That's never
been a problem for me, seems a bit hackish, but works well.

On Sat, Dec 1, 2012 at 10:50 AM, Connell, Chuck <[EMAIL PROTECTED]>wrote:

>  I am trying to use BINARY columns and believe I have the perfect
> use-case for it, but I am missing something. Has anyone used this for true
> binary data (which may contain newlines)?
>
>
>  Here is the background... I have some files that each contain just one
> logical field, which is a binary object. (The files are Google Protobuf
> format.) I want to put these binary files into a larger file, where each
> protobuf is a logical record. Then I want to define a Hive table that
> stores each protobuf as one row, with the entire protobuf object in one
> BINARY column. Then I will use a custom UDF to select/query the binary
> object.
>
>
>  This is about as simple as can be for putting binary data into Hive.
>
>
>  What file format should I use to package the binary rows? What should
> the Hive table definition be? Which SerDe option (LazySimpleBinary?). I
> cannot use TEXTFILE, since the binary may contain newlines. Many of my
> attempts have choked on the newlines.
>
>
>  Thank you,
>
> Chuck Connell
>
> Nuance
>
> Burlington, MA
>
>
+
Connell, Chuck 2012-12-01, 22:11
+
John Omernik 2012-12-02, 04:58
+
Connell, Chuck 2012-12-02, 15:00
+
John Omernik 2012-12-02, 16:27
+
Connell, Chuck 2012-12-02, 18:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB