Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> user define data format


Copy link to this message
-
Re: user define data format
Hi Richard,
What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.

Mark
----- Original Message -----
From: "Bejoy Ks" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Monday, May 21, 2012 7:22:58 AM
Subject: Re: user define data format

Hi Richard
In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
Regards
Bejoy KS

From: Richard <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Monday, May 21, 2012 3:14 PM
Subject: user define data format

Hi, I want to use Hive on some data in the following format:
<doc>\0x01
field1=val1\0x01
field2=val2\0x01
...
</doc>\0x01

the lines between <doc> and </doc> are a record. How should I define the table?

thanks.
Richard
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB