Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Processing fixed length records with Pig


Copy link to this message
-
RE: Processing fixed length records with Pig
I'm a newbie, so fair warning.

Try loading each record into a single-element tuple, so each tuple is just the text of one line.  Then stream that relation through a UDF that that reads and parses the data into standard \t or ',' separated fields. That should be no more than a couple lines of python or perl. I am doing something quite similar with XML using XMLLoader from piggybank to slurp in one XML document at a time, then my UDF pulls out what I need from the XML and writes one ','-separated line per record.

HTH,

Will

William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
-----Original Message-----
From: Shantian Purkad [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, April 06, 2011 2:16 PM
To: [EMAIL PROTECTED]
Subject: Re: Processing fixed length records with Pig

Any ideas on this?

________________________________
From: Shantian Purkad <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Mon, April 4, 2011 11:19:14 PM
Subject: Processing fixed length records with Pig
Hi,

I have a file which has records having fixed length fields (and spaces appended
to fill the field length)

How can I load these records using Pig specifying field lengths and also auto
trimming extra spaces.
Thanks and Regards,
Shantian