Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help splitting a line into multiple lines


Copy link to this message
-
Help splitting a line into multiple lines
Hi all,

I am new to Pig, and struggle to split up a long text line into multiple
lines.
I have an input format from a legacy mysqldump like:

LOCK TABLES `t` WRITE;
/*!40000 ALTER TABLE `t` DISABLE KEYS */;
INSERT INTO `t` VALUES ('a','b'),('c','d'),('e','f');
/*!40000 ALTER TABLE `t` ENABLE KEYS */;
UNLOCK TABLES;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

and I am trying to turn that into something like:

'a','b'
'c','d'
'e','f'

So far I have come up with the following:

-- Load in the raw data that is the actual mysqldump output
mysqldump = LOAD '/Users/tim/Desktop/rollover/dump.txt' USING TextLoader as
(line:chararray);

-- Find only those lines starting with the insert statement we care about
insertLines = FILTER mysqldump BY (line matches 'INSERT INTO.*');

-- split them by the ),(
splits = FOREACH insertLines GENERATE STRSPLIT(line,'\\),\\(');

Can anyone please help me with the last bit so I can turn those into a line
per split, instead of a tuple per split?

Sorry that my terminology is probably wrong... it's my first day on Pig.

Thanks,
Tim