-Re: how to handle variable format data of text file?
Ramki Palle 2013-03-18, 15:26
One way you can try is to make your ldata as a map field as it contains
variable formatted data and write a UDF to get whatever information you
On Mon, Mar 18, 2013 at 1:23 AM, Zhiwen Sun <[EMAIL PROTECTED]> wrote:
> As u defined in create table hql: fields delimited by blank space. So, the
> other data is omitted
> if you wanna contain rest data at the end of line. I suggest you use
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of
> default delimited format.
> Zhiwen Sun
> On Mon, Mar 11, 2013 at 12:04 PM, 周梦想 <[EMAIL PROTECTED]> wrote:
>> I have files like this:
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
>> 03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
>> the format is separated by blank space:
>> date time threadid gid userid [variable formated data grouped by fields
>> separated by space ]
>> I'd like to create a table like:
>> hive> create external table handresult (hdate string,htime string, thid
>> string, gid int, userid string,ldata string) row format delimited fields
>> terminated by " ";
>> but the above table will only have a part of the data.
>> select * from handresult;
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041
>> 03/11/13 10:59:52 00000744 1010 178343610 92042
>> the remain data like "2300 0 0 7 21|47|20|33|11 0:2775 " I can't get.
>> while ldata may be variance length and format separated by " " or an
>> array, the ldata we will parse diferent by each gid.
>> how do this?
>> Andy Zhou