Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - how to handle variable format data of text file?


+
周梦想 2013-03-11, 04:04
+
Zhiwen Sun 2013-03-18, 08:23
Copy link to this message
-
Re: how to handle variable format data of text file?
Ramki Palle 2013-03-18, 15:26
One way you can try is to make your ldata as a map field as it contains
variable formatted data and write a UDF to get whatever information you
need get.

Regards,
Ramki.

On Mon, Mar 18, 2013 at 1:23 AM, Zhiwen Sun <[EMAIL PROTECTED]> wrote:

> As u defined in create table hql: fields delimited by blank space. So, the
> other data is omitted
>
> if you wanna contain rest data at the end of line. I suggest you use
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of
> default delimited format.
>
>
> Zhiwen Sun
>
>
>
> On Mon, Mar 11, 2013 at 12:04 PM, 周梦想 <[EMAIL PROTECTED]> wrote:
>
>> I have files like this:
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
>> 0:2775
>> 03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
>> the format is separated by blank space:
>> date time threadid gid userid [variable formated data grouped by fields
>> separated by space ]
>>
>> I'd like to create a table like:
>>
>> hive> create external table handresult (hdate string,htime string, thid
>> string, gid int, userid string,ldata string) row format delimited fields
>> terminated by  " ";
>> OK
>>
>> but the above table will only have a part of the data.
>> select * from handresult;
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041
>> 03/11/13 10:59:52 00000744 1010 178343610 92042
>>
>> the remain data  like "2300 0 0 7 21|47|20|33|11 0:2775 "  I can't get.
>>
>> while ldata may be variance length and format separated by " " or an
>> array, the ldata we will parse diferent  by each gid.
>>
>> how do this?
>>
>> Thanks,
>> Andy Zhou
>>
>
>