Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Converting xml to csv


Copy link to this message
-
Re: Converting xml to csv
Hi,
 I am trying to parse following json
 <employee>
    <employee_id>1234</employee_id>
    <email>[EMAIL PROTECTED]</email>
    <name>(first_name_1234,middle_initial_1234,last_name_1234)</name>

<projects>{(project_1234_1),(project_1234_2),(project_1234_3)}</projects>
    <skills>[programming:SQL,rdbms:Oracle]</skills>
  </employee>

And my script is

a = LOAD 'sample.xml' USING
org.apache.pig.piggybank.storage.XMLLoader('employee') as (x:chararray);
B = foreach a generate REGEX_EXTRACT(x,'<employee>(.*)</employee>',1)
dump B;
 now B is empty tuple here?
Not sure what am i missing?
On Wed, Sep 11, 2013 at 11:35 PM, ajay kumar <[EMAIL PROTECTED]>wrote:

> use org.apache.pig.piggybank.storage.XMLLoader  and then extract them using
> regex_all
>
>
> On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha <[EMAIL PROTECTED]>
> wrote:
>
> > Umm.. yess.. but how do i generalize it..
> > so what I am looking for is.. just like we have json parser in say java
> > If i give a valid json string.. I can parse it as and then i can access
> it
> > as a hashmap..
> > But in xml loader.. i still have to specify regex rules??
> >
> > Actually, is it possible to just flatten the xml..
> > so for example
> > convert
> > <aux>
> > <foobar>1</foobar>
> > <fushbar>foo</fushbar>
> > </aux>
> > to
> > <aux><foobar>1</foobar><fushbar>foo</fushbar></aux>
> > ???
> >
> >
> >
> >
> > On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Use piggybank xmlloader
> > >  On 12/09/2013 10:14 AM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi,
> > > >   So I have different xml data sources...For example:
> > > >
> > > > src1.txt
> > > >
> > > > <foo>
> > > > <bar>1</bar>
> > > > </foo>
> > > > <foo>
> > > > <bar>2</bar>
> > > > </foo>
> > > > .. and so on
> > > >
> > > >
> > > > and another data
> > > >
> > > > src2.txt
> > > >
> > > > <aux>
> > > > <foobar>1</foobar>
> > > > <fushbar>foo</fushbar>
> > > > </aux>
> > > >
> > > > ... and so on
> > > >
> > > >
> > > > So basicaly different xml (valid formats)
> > > >
> > > > Rather than writing different pig scripts.. is there a way to write 1
> > > > script and then convert all these xml data into csv?
> > > > Thanks
> > > >
> > >
> >
>
>
>
> --
> *Thanks & Regards,*
> *S. Ajay Kumar
> +91-9966159106*
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB