Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - user define data format


Copy link to this message
-
Re: user define data format
Mark Grover 2012-05-22, 16:10
Hi Richard,
What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.

Mark
----- Original Message -----
From: "Bejoy Ks" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Monday, May 21, 2012 7:22:58 AM
Subject: Re: user define data format

Hi Richard
In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
Regards
Bejoy KS

From: Richard <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Monday, May 21, 2012 3:14 PM
Subject: user define data format

Hi, I want to use Hive on some data in the following format:
<doc>\0x01
field1=val1\0x01
field2=val2\0x01
...
</doc>\0x01

the lines between <doc> and </doc> are a record. How should I define the table?

thanks.
Richard