Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> SerDe question


Copy link to this message
-
Re: SerDe question
There are a couple of problems. First of all, input.regex needs to be
"(\\w+)". Please note the case.
The bigger problem though, is that, with this (and most) serdes, you
can only expect one row per line of input. So multiple words within
the text cannot generate multiple rows. The best option is to probably
parse the text file and generate a different file with each word on a
separate line and then load it into hive.

Hope that helps,
Vijay

On Tue, Sep 27, 2011 at 6:45 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote:
> Hi, Hive experts,
>
> Would you see what I am doing wrong? For a simple test of breaking a text
> into words and putting these words into a table, I am doing this
>
> CREATE EXTERNAL TABLE books1
> (
>   words string
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES ("input.regex" = "\\W")
> STORED AS TextFile;
>
> LOAD DATA INPATH '/test-data/ch1/moby-dick.txt'  OVERWRITE INTO TABLE
> books1;
>
> This SerDe works in Java code, but in Hive I am getting all nulls in the
> books1 table.
>
> Thank you,
> Mark
>