Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - BINARY column type


Copy link to this message
-
Re: BINARY column type
John Omernik 2012-12-01, 21:22
Hi Chuck -

I've used binary columns with Newlines in the data. I used RCFile format
for my storage method. Works great so far. Whether or not this is "the" way
to get data in, I use hexed data (my transform script outputs hex encoded)
and the final insert into the table gets a unhex(sourcedata).  That's never
been a problem for me, seems a bit hackish, but works well.

On Sat, Dec 1, 2012 at 10:50 AM, Connell, Chuck <[EMAIL PROTECTED]>wrote:

>  I am trying to use BINARY columns and believe I have the perfect
> use-case for it, but I am missing something. Has anyone used this for true
> binary data (which may contain newlines)?
>
>
>  Here is the background... I have some files that each contain just one
> logical field, which is a binary object. (The files are Google Protobuf
> format.) I want to put these binary files into a larger file, where each
> protobuf is a logical record. Then I want to define a Hive table that
> stores each protobuf as one row, with the entire protobuf object in one
> BINARY column. Then I will use a custom UDF to select/query the binary
> object.
>
>
>  This is about as simple as can be for putting binary data into Hive.
>
>
>  What file format should I use to package the binary rows? What should
> the Hive table definition be? Which SerDe option (LazySimpleBinary?). I
> cannot use TEXTFILE, since the binary may contain newlines. Many of my
> attempts have choked on the newlines.
>
>
>  Thank you,
>
> Chuck Connell
>
> Nuance
>
> Burlington, MA
>
>