Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> parsing data


Hi,

I would like to parse data from the following format:
keren^A1^A2^Aqt^A3^A4^A5.0
to the following format:

{"user":"keren", "action": 1, "timespent": 2, "query_term":"qt", "ip_addr":
3, "timestamp": 4, "estimated_revenue": 5.0 }

[I also happen to have a map and a bag of maps but for the sake of
simplicity I didn't add them to the example]

Should I use PigPerformanceLoader or a Pig script for the above parsing?
Modifying PigPerformanceLoader seems like a low-hanging fruit though I
might have to do it several changes and modifying a Pig script seems a more
elegant solution (just not sure how).

Thanks,
Keren

--
Keren Ouaknine
www.kereno.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB