Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Converting xml to csv


Copy link to this message
-
Re: Converting xml to csv
use org.apache.pig.piggybank.storage.XMLLoader  and then extract them using
regex_all
On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Umm.. yess.. but how do i generalize it..
> so what I am looking for is.. just like we have json parser in say java
> If i give a valid json string.. I can parse it as and then i can access it
> as a hashmap..
> But in xml loader.. i still have to specify regex rules??
>
> Actually, is it possible to just flatten the xml..
> so for example
> convert
> <aux>
> <foobar>1</foobar>
> <fushbar>foo</fushbar>
> </aux>
> to
> <aux><foobar>1</foobar><fushbar>foo</fushbar></aux>
> ???
>
>
>
>
> On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh <[EMAIL PROTECTED]>
> wrote:
>
> > Use piggybank xmlloader
> >  On 12/09/2013 10:14 AM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >   So I have different xml data sources...For example:
> > >
> > > src1.txt
> > >
> > > <foo>
> > > <bar>1</bar>
> > > </foo>
> > > <foo>
> > > <bar>2</bar>
> > > </foo>
> > > .. and so on
> > >
> > >
> > > and another data
> > >
> > > src2.txt
> > >
> > > <aux>
> > > <foobar>1</foobar>
> > > <fushbar>foo</fushbar>
> > > </aux>
> > >
> > > ... and so on
> > >
> > >
> > > So basicaly different xml (valid formats)
> > >
> > > Rather than writing different pig scripts.. is there a way to write 1
> > > script and then convert all these xml data into csv?
> > > Thanks
> > >
> >
>

--
*Thanks & Regards,*
*S. Ajay Kumar
+91-9966159106*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB