Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help splitting a line into multiple lines


Copy link to this message
-
Help splitting a line into multiple lines
Hi all,

I am new to Pig, and struggle to split up a long text line into multiple
lines.
I have an input format from a legacy mysqldump like:

LOCK TABLES `t` WRITE;
/*!40000 ALTER TABLE `t` DISABLE KEYS */;
INSERT INTO `t` VALUES ('a','b'),('c','d'),('e','f');
/*!40000 ALTER TABLE `t` ENABLE KEYS */;
UNLOCK TABLES;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

and I am trying to turn that into something like:

'a','b'
'c','d'
'e','f'

So far I have come up with the following:

-- Load in the raw data that is the actual mysqldump output
mysqldump = LOAD '/Users/tim/Desktop/rollover/dump.txt' USING TextLoader as
(line:chararray);

-- Find only those lines starting with the insert statement we care about
insertLines = FILTER mysqldump BY (line matches 'INSERT INTO.*');

-- split them by the ),(
splits = FOREACH insertLines GENERATE STRSPLIT(line,'\\),\\(');

Can anyone please help me with the last bit so I can turn those into a line
per split, instead of a tuple per split?

Sorry that my terminology is probably wrong... it's my first day on Pig.

Thanks,
Tim
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB