You might also want to look at Gobblin which uses Helix in a very similar way and is actually used to read data from HDFS, do transformations and load into remote store.

