-Re: Help splitting a line into multiple lines
Ruslan Al-Fakikh 2013-12-24, 12:46
I guess you are getting a bag of tuples here.
Try to apply FLATTEN on the bag.
On Wed, Dec 18, 2013 at 12:20 AM, Tim Robertson
> Hi all,
> I am new to Pig, and struggle to split up a long text line into multiple
> I have an input format from a legacy mysqldump like:
> LOCK TABLES `t` WRITE;
> /*!40000 ALTER TABLE `t` DISABLE KEYS */;
> INSERT INTO `t` VALUES ('a','b'),('c','d'),('e','f');
> /*!40000 ALTER TABLE `t` ENABLE KEYS */;
> UNLOCK TABLES;
> /*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
> and I am trying to turn that into something like:
> So far I have come up with the following:
> -- Load in the raw data that is the actual mysqldump output
> mysqldump = LOAD '/Users/tim/Desktop/rollover/dump.txt' USING TextLoader as
> -- Find only those lines starting with the insert statement we care about
> insertLines = FILTER mysqldump BY (line matches 'INSERT INTO.*');
> -- split them by the ),(
> splits = FOREACH insertLines GENERATE STRSPLIT(line,'\\),\\(');
> Can anyone please help me with the last bit so I can turn those into a line
> per split, instead of a tuple per split?
> Sorry that my terminology is probably wrong... it's my first day on Pig.