Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> populating xml data in hive


+
iwannaplay games 2012-11-20, 07:59
+
Nitin Pawar 2012-11-20, 08:33
+
iwannaplay games 2012-11-20, 08:59
Copy link to this message
-
Re: populating xml data in hive
You can simply write a mapreduce job which will do the job for you
That will be readily available for hive table
On Nov 20, 2012 2:29 PM, "iwannaplay games" <[EMAIL PROTECTED]>
wrote:

> How to preprocess data where millions of records are there out of
> which only few thousands contain xml data
>
>
> On 11/20/12, Nitin Pawar <[EMAIL PROTECTED]> wrote:
> > Hive currently supports only new line as record separator. If you got
> > newline in in column values then you will need to preprocess your data
> and
> > remove new line from column values
> > On Nov 20, 2012 1:30 PM, "iwannaplay games" <[EMAIL PROTECTED]>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have a csv file ( separated by |) where data is like
> >>
> >> id               data
> >>                                        date
> >> 1            apple
> >>                                   24-nov-2011
> >> 2            mango
> >>                                 26-nov-2011
> >> 3            <?xml version="1.0" encoding="utf-8"?>
> >>                  <a>fruits</a>
> >>                                 28-nov-2011
> >> 4             papaya
> >>                                  30-nov-2011
> >>
> >>
> >> Since id=3 has new line in data field hive  takes only first
> >> line and treats second line as different row.I want my full xml field
> >> to be taken inside data in hive table .
> >>
> >> it seems hive doesnt support            lines terminated by '|'
> >>
> >> How to treat xml data in hive
> >>
> >> Thanks & Regards
> >> Prabhjot
> >>
> >
>
+
Bejoy KS 2012-11-20, 09:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB