|
|
-
user define data format
Richard 2012-05-21, 09:44
Hi, I want to use Hive on some data in the following format: <doc>\0x01 field1=val1\0x01 field2=val2\0x01 ... </doc>\0x01
the lines between <doc> and </doc> are a record. How should I define the table?
thanks. Richard
-
Re: user define data format
Bejoy Ks 2012-05-21, 11:22
Hi Richard
In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
Regards Bejoy KS ________________________________ From: Richard <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Monday, May 21, 2012 3:14 PM Subject: user define data format
Hi, I want to use Hive on some data in the following format: <doc>\0x01 field1=val1\0x01 field2=val2\0x01 ... </doc>\0x01
the lines between <doc> and </doc> are a record. How should I define the table?
thanks. Richard
-
Re: user define data format
Mark Grover 2012-05-22, 16:10
Hi Richard, What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.
Mark ----- Original Message ----- From: "Bejoy Ks" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, May 21, 2012 7:22:58 AM Subject: Re: user define data format
Hi Richard In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data. Regards Bejoy KS
From: Richard <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Monday, May 21, 2012 3:14 PM Subject: user define data format
Hi, I want to use Hive on some data in the following format: <doc>\0x01 field1=val1\0x01 field2=val2\0x01 ... </doc>\0x01
the lines between <doc> and </doc> are a record. How should I define the table?
thanks. Richard
-
Re: user define data format
Edward Capriolo 2012-05-22, 16:52
A crafty trick would be to use streaming and only emit data once you see the end tag as a pre-processing step.
On Tue, May 22, 2012 at 12:10 PM, Mark Grover <[EMAIL PROTECTED]> wrote: > Hi Richard, > What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily. > > Mark > > > ----- Original Message ----- > From: "Bejoy Ks" <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, May 21, 2012 7:22:58 AM > Subject: Re: user define data format > > > > Hi Richard > > > In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data. > > > Regards > Bejoy KS > > > > > > From: Richard <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Monday, May 21, 2012 3:14 PM > Subject: user define data format > > > > Hi, I want to use Hive on some data in the following format: > <doc>\0x01 > field1=val1\0x01 > field2=val2\0x01 > ... > </doc>\0x01 > > the lines between <doc> and </doc> are a record. How should I define the table? > > thanks. > Richard > > > >
|
|