You can use your custom mapreduce code. Just check the record type and if xml then preprocess to avoid new lines.
Sent from handheld, please excuse typos.
From: iwannaplay games <[EMAIL PROTECTED]>
Date: Tue, 20 Nov 2012 14:29:18
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: populating xml data in hive
How to preprocess data where millions of records are there out of
which only few thousands contain xml data
On 11/20/12, Nitin Pawar <[EMAIL PROTECTED]> wrote:
> Hive currently supports only new line as record separator. If you got
> newline in in column values then you will need to preprocess your data and
> remove new line from column values
> On Nov 20, 2012 1:30 PM, "iwannaplay games" <[EMAIL PROTECTED]>
>> Hi All,
>> I have a csv file ( separated by |) where data is like
>> id data
>> 1 apple
>> 2 mango
>> 3 <?xml version="1.0" encoding="utf-8"?>
>> 4 papaya
>> Since id=3 has new line in data field hive takes only first
>> line and treats second line as different row.I want my full xml field
>> to be taken inside data in hive table .
>> it seems hive doesnt support lines terminated by '|'
>> How to treat xml data in hive
>> Thanks & Regards