|
|
-
populating xml data in hive
iwannaplay games 2012-11-20, 07:59
Hi All,
I have a csv file ( separated by |) where data is like
id data date 1 apple 24-nov-2011 2 mango 26-nov-2011 3 <?xml version="1.0" encoding="utf-8"?> <a>fruits</a> 28-nov-2011 4 papaya 30-nov-2011 Since id=3 has new line in data field hive takes only first line and treats second line as different row.I want my full xml field to be taken inside data in hive table .
it seems hive doesnt support lines terminated by '|'
How to treat xml data in hive
Thanks & Regards Prabhjot
+
iwannaplay games 2012-11-20, 07:59
-
Re: populating xml data in hive
Nitin Pawar 2012-11-20, 08:33
Hive currently supports only new line as record separator. If you got newline in in column values then you will need to preprocess your data and remove new line from column values On Nov 20, 2012 1:30 PM, "iwannaplay games" <[EMAIL PROTECTED]> wrote:
> Hi All, > > I have a csv file ( separated by |) where data is like > > id data > date > 1 apple > 24-nov-2011 > 2 mango > 26-nov-2011 > 3 <?xml version="1.0" encoding="utf-8"?> > <a>fruits</a> > 28-nov-2011 > 4 papaya > 30-nov-2011 > > > Since id=3 has new line in data field hive takes only first > line and treats second line as different row.I want my full xml field > to be taken inside data in hive table . > > it seems hive doesnt support lines terminated by '|' > > How to treat xml data in hive > > Thanks & Regards > Prabhjot >
+
Nitin Pawar 2012-11-20, 08:33
-
Re: populating xml data in hive
iwannaplay games 2012-11-20, 08:59
How to preprocess data where millions of records are there out of which only few thousands contain xml data On 11/20/12, Nitin Pawar <[EMAIL PROTECTED]> wrote: > Hive currently supports only new line as record separator. If you got > newline in in column values then you will need to preprocess your data and > remove new line from column values > On Nov 20, 2012 1:30 PM, "iwannaplay games" <[EMAIL PROTECTED]> > wrote: > >> Hi All, >> >> I have a csv file ( separated by |) where data is like >> >> id data >> date >> 1 apple >> 24-nov-2011 >> 2 mango >> 26-nov-2011 >> 3 <?xml version="1.0" encoding="utf-8"?> >> <a>fruits</a> >> 28-nov-2011 >> 4 papaya >> 30-nov-2011 >> >> >> Since id=3 has new line in data field hive takes only first >> line and treats second line as different row.I want my full xml field >> to be taken inside data in hive table . >> >> it seems hive doesnt support lines terminated by '|' >> >> How to treat xml data in hive >> >> Thanks & Regards >> Prabhjot >> >
+
iwannaplay games 2012-11-20, 08:59
-
Re: populating xml data in hive
Nitin Pawar 2012-11-20, 09:03
You can simply write a mapreduce job which will do the job for you That will be readily available for hive table On Nov 20, 2012 2:29 PM, "iwannaplay games" <[EMAIL PROTECTED]> wrote:
> How to preprocess data where millions of records are there out of > which only few thousands contain xml data > > > On 11/20/12, Nitin Pawar <[EMAIL PROTECTED]> wrote: > > Hive currently supports only new line as record separator. If you got > > newline in in column values then you will need to preprocess your data > and > > remove new line from column values > > On Nov 20, 2012 1:30 PM, "iwannaplay games" <[EMAIL PROTECTED]> > > wrote: > > > >> Hi All, > >> > >> I have a csv file ( separated by |) where data is like > >> > >> id data > >> date > >> 1 apple > >> 24-nov-2011 > >> 2 mango > >> 26-nov-2011 > >> 3 <?xml version="1.0" encoding="utf-8"?> > >> <a>fruits</a> > >> 28-nov-2011 > >> 4 papaya > >> 30-nov-2011 > >> > >> > >> Since id=3 has new line in data field hive takes only first > >> line and treats second line as different row.I want my full xml field > >> to be taken inside data in hive table . > >> > >> it seems hive doesnt support lines terminated by '|' > >> > >> How to treat xml data in hive > >> > >> Thanks & Regards > >> Prabhjot > >> > > >
+
Nitin Pawar 2012-11-20, 09:03
-
Re: populating xml data in hive
Bejoy KS 2012-11-20, 09:03
You can use your custom mapreduce code. Just check the record type and if xml then preprocess to avoid new lines.
Regards Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message----- From: iwannaplay games <[EMAIL PROTECTED]> Date: Tue, 20 Nov 2012 14:29:18 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: populating xml data in hive
How to preprocess data where millions of records are there out of which only few thousands contain xml data On 11/20/12, Nitin Pawar <[EMAIL PROTECTED]> wrote: > Hive currently supports only new line as record separator. If you got > newline in in column values then you will need to preprocess your data and > remove new line from column values > On Nov 20, 2012 1:30 PM, "iwannaplay games" <[EMAIL PROTECTED]> > wrote: > >> Hi All, >> >> I have a csv file ( separated by |) where data is like >> >> id data >> date >> 1 apple >> 24-nov-2011 >> 2 mango >> 26-nov-2011 >> 3 <?xml version="1.0" encoding="utf-8"?> >> <a>fruits</a> >> 28-nov-2011 >> 4 papaya >> 30-nov-2011 >> >> >> Since id=3 has new line in data field hive takes only first >> line and treats second line as different row.I want my full xml field >> to be taken inside data in hive table . >> >> it seems hive doesnt support lines terminated by '|' >> >> How to treat xml data in hive >> >> Thanks & Regards >> Prabhjot >> >
+
Bejoy KS 2012-11-20, 09:03
|
|