Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> how to handle variable format data of text file?


Copy link to this message
-
Re: how to handle variable format data of text file?
One way you can try is to make your ldata as a map field as it contains
variable formatted data and write a UDF to get whatever information you
need get.

Regards,
Ramki.

On Mon, Mar 18, 2013 at 1:23 AM, Zhiwen Sun <[EMAIL PROTECTED]> wrote:

> As u defined in create table hql: fields delimited by blank space. So, the
> other data is omitted
>
> if you wanna contain rest data at the end of line. I suggest you use
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of
> default delimited format.
>
>
> Zhiwen Sun
>
>
>
> On Mon, Mar 11, 2013 at 12:04 PM, 周梦想 <[EMAIL PROTECTED]> wrote:
>
>> I have files like this:
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
>> 0:2775
>> 03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
>> the format is separated by blank space:
>> date time threadid gid userid [variable formated data grouped by fields
>> separated by space ]
>>
>> I'd like to create a table like:
>>
>> hive> create external table handresult (hdate string,htime string, thid
>> string, gid int, userid string,ldata string) row format delimited fields
>> terminated by  " ";
>> OK
>>
>> but the above table will only have a part of the data.
>> select * from handresult;
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041
>> 03/11/13 10:59:52 00000744 1010 178343610 92042
>>
>> the remain data  like "2300 0 0 7 21|47|20|33|11 0:2775 "  I can't get.
>>
>> while ldata may be variance length and format separated by " " or an
>> array, the ldata we will parse diferent  by each gid.
>>
>> how do this?
>>
>> Thanks,
>> Andy Zhou
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB