Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Storing dates in LzoJsonStorage


Copy link to this message
-
Storing dates in LzoJsonStorage
Hi,

I'm working on a pig script (and some associated UDFs) to do a little
cleaning on some data, stored as JSON objects, being used by other scripts
down the pipe. The JSON objects contain dates, such as:

        "end_time" : ISODate("2012-11-07T00:29:58.728Z"),

which I've noticed that I can read as end_time#'$date' and get a
milliseconds since epoch long.

In order not to cause problems with the scripts down the pipe, I'd like
to be able to store the dates back into JSON objects using
LzoJsonStorage, so that the existing scripts can just be repointed at
the cleaned data, and otherwise continue working without any rewrites.

The last line in the pig script doing this cleaning is a UDF call that
generates the map expected by LzoJsonStorage, so if there's some
trickery that can be done with Java objects which don't correspond to
pig data types (e.g. Date), I'd be up for giving that a go.

Can this be done? If so, is there a Right Way to do it?

Thanks,
Kris

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB