Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Processing fixed length records with Pig


Copy link to this message
-
RE: Processing fixed length records with Pig
I'm a newbie, so fair warning.

Try loading each record into a single-element tuple, so each tuple is just the text of one line.  Then stream that relation through a UDF that that reads and parses the data into standard \t or ',' separated fields. That should be no more than a couple lines of python or perl. I am doing something quite similar with XML using XMLLoader from piggybank to slurp in one XML document at a time, then my UDF pulls out what I need from the XML and writes one ','-separated line per record.

HTH,

Will

William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
-----Original Message-----
From: Shantian Purkad [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, April 06, 2011 2:16 PM
To: [EMAIL PROTECTED]
Subject: Re: Processing fixed length records with Pig

Any ideas on this?

________________________________
From: Shantian Purkad <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Mon, April 4, 2011 11:19:14 PM
Subject: Processing fixed length records with Pig
Hi,

I have a file which has records having fixed length fields (and spaces appended
to fill the field length)

How can I load these records using Pig specifying field lengths and also auto
trimming extra spaces.
Thanks and Regards,
Shantian
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB