Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Converting xml to csv


Copy link to this message
-
Re: Converting xml to csv
jamal sasha 2013-09-12, 22:24
Hi,
 I am trying to parse following json
 <employee>
    <employee_id>1234</employee_id>
    <email>[EMAIL PROTECTED]</email>
    <name>(first_name_1234,middle_initial_1234,last_name_1234)</name>

<projects>{(project_1234_1),(project_1234_2),(project_1234_3)}</projects>
    <skills>[programming:SQL,rdbms:Oracle]</skills>
  </employee>

And my script is

a = LOAD 'sample.xml' USING
org.apache.pig.piggybank.storage.XMLLoader('employee') as (x:chararray);
B = foreach a generate REGEX_EXTRACT(x,'<employee>(.*)</employee>',1)
dump B;
 now B is empty tuple here?
Not sure what am i missing?
On Wed, Sep 11, 2013 at 11:35 PM, ajay kumar <[EMAIL PROTECTED]>wrote:

> use org.apache.pig.piggybank.storage.XMLLoader  and then extract them using
> regex_all
>
>
> On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha <[EMAIL PROTECTED]>
> wrote:
>
> > Umm.. yess.. but how do i generalize it..
> > so what I am looking for is.. just like we have json parser in say java
> > If i give a valid json string.. I can parse it as and then i can access
> it
> > as a hashmap..
> > But in xml loader.. i still have to specify regex rules??
> >
> > Actually, is it possible to just flatten the xml..
> > so for example
> > convert
> > <aux>
> > <foobar>1</foobar>
> > <fushbar>foo</fushbar>
> > </aux>
> > to
> > <aux><foobar>1</foobar><fushbar>foo</fushbar></aux>
> > ???
> >
> >
> >
> >
> > On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Use piggybank xmlloader
> > >  On 12/09/2013 10:14 AM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi,
> > > >   So I have different xml data sources...For example:
> > > >
> > > > src1.txt
> > > >
> > > > <foo>
> > > > <bar>1</bar>
> > > > </foo>
> > > > <foo>
> > > > <bar>2</bar>
> > > > </foo>
> > > > .. and so on
> > > >
> > > >
> > > > and another data
> > > >
> > > > src2.txt
> > > >
> > > > <aux>
> > > > <foobar>1</foobar>
> > > > <fushbar>foo</fushbar>
> > > > </aux>
> > > >
> > > > ... and so on
> > > >
> > > >
> > > > So basicaly different xml (valid formats)
> > > >
> > > > Rather than writing different pig scripts.. is there a way to write 1
> > > > script and then convert all these xml data into csv?
> > > > Thanks
> > > >
> > >
> >
>
>
>
> --
> *Thanks & Regards,*
> *S. Ajay Kumar
> +91-9966159106*
>