|
|
Mark Kerzner 2011-09-28, 01:45
Hi, Hive experts,
Would you see what I am doing wrong? For a simple test of breaking a text into words and putting these words into a table, I am doing this
CREATE EXTERNAL TABLE books1 ( words string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "\\W") STORED AS TextFile;
LOAD DATA INPATH '/test-data/ch1/moby-dick.txt' OVERWRITE INTO TABLE books1;
This SerDe works in Java code, but in Hive I am getting all nulls in the books1 table.
Thank you, Mark
There are a couple of problems. First of all, input.regex needs to be "(\\w+)". Please note the case. The bigger problem though, is that, with this (and most) serdes, you can only expect one row per line of input. So multiple words within the text cannot generate multiple rows. The best option is to probably parse the text file and generate a different file with each word on a separate line and then load it into hive.
Hope that helps, Vijay
On Tue, Sep 27, 2011 at 6:45 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Hi, Hive experts, > > Would you see what I am doing wrong? For a simple test of breaking a text > into words and putting these words into a table, I am doing this > > CREATE EXTERNAL TABLE books1 > ( > words string > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > WITH SERDEPROPERTIES ("input.regex" = "\\W") > STORED AS TextFile; > > LOAD DATA INPATH '/test-data/ch1/moby-dick.txt' OVERWRITE INTO TABLE > books1; > > This SerDe works in Java code, but in Hive I am getting all nulls in the > books1 table. > > Thank you, > Mark >
Mark Kerzner 2011-09-28, 04:26
Thank you, Vijay.
I was beginning to understand things that way myself, and you made it perfectly clear.
Sincerely, Mark
On Tue, Sep 27, 2011 at 11:18 PM, Vijay <[EMAIL PROTECTED]> wrote:
> There are a couple of problems. First of all, input.regex needs to be > "(\\w+)". Please note the case. > The bigger problem though, is that, with this (and most) serdes, you > can only expect one row per line of input. So multiple words within > the text cannot generate multiple rows. The best option is to probably > parse the text file and generate a different file with each word on a > separate line and then load it into hive. > > Hope that helps, > Vijay > > On Tue, Sep 27, 2011 at 6:45 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > Hi, Hive experts, > > > > Would you see what I am doing wrong? For a simple test of breaking a text > > into words and putting these words into a table, I am doing this > > > > CREATE EXTERNAL TABLE books1 > > ( > > words string > > ) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > > WITH SERDEPROPERTIES ("input.regex" = "\\W") > > STORED AS TextFile; > > > > LOAD DATA INPATH '/test-data/ch1/moby-dick.txt' OVERWRITE INTO TABLE > > books1; > > > > This SerDe works in Java code, but in Hive I am getting all nulls in the > > books1 table. > > > > Thank you, > > Mark > > >
|
|