Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help splitting a line into multiple lines


Copy link to this message
-
Re: Help splitting a line into multiple lines
I guess you are getting a bag of tuples here.
Try to apply FLATTEN on the bag.

Thanks
On Wed, Dec 18, 2013 at 12:20 AM, Tim Robertson
<[EMAIL PROTECTED]>wrote:

> Hi all,
>
> I am new to Pig, and struggle to split up a long text line into multiple
> lines.
> I have an input format from a legacy mysqldump like:
>
> LOCK TABLES `t` WRITE;
> /*!40000 ALTER TABLE `t` DISABLE KEYS */;
> INSERT INTO `t` VALUES ('a','b'),('c','d'),('e','f');
> /*!40000 ALTER TABLE `t` ENABLE KEYS */;
> UNLOCK TABLES;
> /*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
>
> and I am trying to turn that into something like:
>
> 'a','b'
> 'c','d'
> 'e','f'
>
> So far I have come up with the following:
>
> -- Load in the raw data that is the actual mysqldump output
> mysqldump = LOAD '/Users/tim/Desktop/rollover/dump.txt' USING TextLoader as
> (line:chararray);
>
> -- Find only those lines starting with the insert statement we care about
> insertLines = FILTER mysqldump BY (line matches 'INSERT INTO.*');
>
> -- split them by the ),(
> splits = FOREACH insertLines GENERATE STRSPLIT(line,'\\),\\(');
>
> Can anyone please help me with the last bit so I can turn those into a line
> per split, instead of a tuple per split?
>
> Sorry that my terminology is probably wrong... it's my first day on Pig.
>
> Thanks,
> Tim
>