I tried recreating your exact dump with a more recent(built as of
about 3 weeks back) hive. And in addition to the base64-decoded
version of the binary data, I get some extraneous characters in every
line of the select *. (consistently the same extra characters)
For eg, an od -c of the first line of this table goes:
N U L L \t a b c \t 357 277 275 M \n
The correct base64-decode of "001" is just "M".
Saving this to another equivalent table, with a CTAS (create table as
select) yields a similar encoding to the original file for the last
two lines, and an extra "=" at the end for each line before. That
encoding, in turn, seems stable, if I CTAS from that table to another.
All 3 yield the same output when I do select *. I get the same output
from select * even when I CTAS to an rcfile.
The problem might be with the LazySimpleSerDe binary decode, but if
so, it is so with the encode as well. Or, the problem might be with
how binary data is output using select *. Either way, this merits
creating a jira to address.
On Wed, Sep 4, 2013 at 2:35 AM, Arun Vasu <[EMAIL PROTECTED]> wrote:
> I am using Hive 10. When I create an external table with column type as
> Binary, the query result on the table is showing some junk values for the
> column with binary datatype.
> Please find below the query I have used to create the table:
> CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '^'
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/hivetables/testbinary';
> The query I have used is : select * from bool1
> The sample data in the hdfs file is:
> 0^[EMAIL PROTECTED]^001
> 1^[EMAIL PROTECTED]^010
> ^[EMAIL PROTECTED]^011
> ^[EMAIL PROTECTED]^100
> t^[EMAIL PROTECTED]^101
> f^[EMAIL PROTECTED]^110
> true^[EMAIL PROTECTED]^111
> false^[EMAIL PROTECTED]^001
> 123^ ^01100010
> 12344^ ^01100001
> Please share your inputs if it is possible.