Hello,

     I am working on a Crunch pipeline where the output is going to be read
by subsequent Hive jobs.  I want to partition it by the timezone contained
in the data records.  What is the best way to support this in Crunch?

     From the googling I did, it looked like one approach would be to write
the data out into a PTable keyed by the timezone, then use the
AvroPathPerKeyTarget.  However, from what I can tell this only works if I
am writing to an Avro output.  Is there similar functionality available for
parquet output?

     Alternatively, is there a better way to do this?  I imagine I could
filter the collection for each timezone, but that doesn't seem like it
would be an efficient way to bucket the data.

Thanks,
     Dave
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB