Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Converting xml to csv


+
jamal sasha 2013-09-12, 00:14
+
inelu nagamallikarjuna 2013-09-12, 02:38
+
jamal sasha 2013-09-12, 05:10
+
Jagat Singh 2013-09-12, 05:32
+
jamal sasha 2013-09-12, 05:47
+
ajay kumar 2013-09-12, 06:35
+
jamal sasha 2013-09-12, 22:24
+
ajay kumar 2013-09-13, 06:21
+
william.dowling@... 2013-09-13, 13:32
+
ajay kumar 2013-09-16, 05:10
+
william.dowling@... 2013-09-16, 13:51
Copy link to this message
-
Re: Converting xml to csv
yeah thank you...

now im  also struck. if possible, can you share the solution ??
On Mon, Sep 16, 2013 at 7:21 PM, <[EMAIL PROTECTED]> wrote:

> Your example had newlines in the <employee> element. The regular
> expression .* does not match newlines. One way to remove newlines is
> REPLACE(x,'[\\n]',''). If the text ranges you are interested in do not
> contain newlines, for example if you are interested in <employee_id> but do
> not care about its relation to other elements inside the same <employee>
> element, then you do not need to do this.
>
> William F Dowling
> Senior Technologist
> Thomson Reuters
>
>
> -----Original Message-----
> From: ajay kumar [mailto:[EMAIL PROTECTED]]
> Sent: Monday, September 16, 2013 1:11 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Converting xml to csv
>
> SORRY IF I AM WRONG..
>
> WHY WE NEED TO USE REPLACE...I MEAN WHAT IS THE ADVANTAGE
>
>
> On Fri, Sep 13, 2013 at 7:02 PM, <[EMAIL PROTECTED]>
> wrote:
>
> > Ajay's suggestion will work for elements like <employee_id> in your
> > example, that occur all on one line. If you want to get the whole
> > <employee> element, and that spans more than one line, you will not be
> able
> > to get it with matching (.*) since that will not match a newline
> character.
> >
> > You can remove newline characters using
> > B = foreach A generate REPLACE(x,'[\\n]','');
> >
> >
> > William F Dowling
> > Senior Technologist
> > Thomson Reuters
> >
> >
> > -----Original Message-----
> > From: ajay kumar [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, September 13, 2013 2:21 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Converting xml to csv
> >
> > try this ...
> >
> > register /usr/lib/pig/piggybank.jar
> > A = load '/home/sudeep/Desktop/test1' using
> > org.apache.pig.piggybank.storage.XMLLoader('employee_id') as
> (x:chararray);
> > B = foreach A generate
> > REGEX_EXTRACT(x,'<employee_id>(.*)</employee_id>',1);
> >
> >
> > On Fri, Sep 13, 2013 at 3:54 AM, jamal sasha <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi,
> > >  I am trying to parse following json
> > >
> > >
> > >  <employee>
> > >     <employee_id>1234</employee_id>
> > >     <email>[EMAIL PROTECTED]</email>
> > >     <name>(first_name_1234,middle_initial_1234,last_name_1234)</name>
> > >
> > >
> <projects>{(project_1234_1),(project_1234_2),(project_1234_3)}</projects>
> > >     <skills>[programming:SQL,rdbms:Oracle]</skills>
> > >   </employee>
> > >
> > > And my script is
> > >
> > > a = LOAD 'sample.xml' USING
> > > org.apache.pig.piggybank.storage.XMLLoader('employee') as
> (x:chararray);
> > > B = foreach a generate REGEX_EXTRACT(x,'<employee>(.*)</employee>',1)
> > > dump B;
> > >  now B is empty tuple here?
> > > Not sure what am i missing?
> > >
> > >
> > >
> > >
> > > On Wed, Sep 11, 2013 at 11:35 PM, ajay kumar <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > use org.apache.pig.piggybank.storage.XMLLoader  and then extract them
> > > using
> > > > regex_all
> > > >
> > > >
> > > > On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > >
> > > > > Umm.. yess.. but how do i generalize it..
> > > > > so what I am looking for is.. just like we have json parser in say
> > java
> > > > > If i give a valid json string.. I can parse it as and then i can
> > access
> > > > it
> > > > > as a hashmap..
> > > > > But in xml loader.. i still have to specify regex rules??
> > > > >
> > > > > Actually, is it possible to just flatten the xml..
> > > > > so for example
> > > > > convert
> > > > > <aux>
> > > > > <foobar>1</foobar>
> > > > > <fushbar>foo</fushbar>
> > > > > </aux>
> > > > > to
> > > > > <aux><foobar>1</foobar><fushbar>foo</fushbar></aux>
> > > > > ???
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > Use piggybank xmlloader
> > > > > >  On 12/09/2013 10:14 AM, "jamal sasha" <[EMAIL PROTECTED]>

*Thanks & Regards,*
*S. Ajay Kumar
+91-9966159106*
+
william.dowling@... 2013-09-17, 15:19