-Re: user define data format
Mark Grover 2012-05-22, 16:10
What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.
----- Original Message -----
From: "Bejoy Ks" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Monday, May 21, 2012 7:22:58 AM
Subject: Re: user define data format
In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
From: Richard <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Monday, May 21, 2012 3:14 PM
Subject: user define data format
Hi, I want to use Hive on some data in the following format:
the lines between <doc> and </doc> are a record. How should I define the table?