Bigtop, mail # dev - Bigtop: generating fake data - 2014-02-15, 14:20
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Bigtop: generating fake data
Hi bigtop.  Are we interested in maintaining our own infra for generating fake data , rather than relying on and downloading external data sources for smokes?  Fake data is great for testing I think...  

In bigpetstore I'm generating fake data , written a lot of code to do this in the custom input formats.... but I just found :

http://codearte.github.io/jfairy/

Which is a groovy tool for doing the same....

  I wonder wether generating fake data for testing big data should be a first-class part of bigtop ?  Would others use a utility or just me ?

It might be another useful artifact for the community especially for bigpetstore but also for testing a variety of other machine learning related projects....

I think it's bad to rely on external websites for our tests, maybe in time we could move over to our in internally curated/generated data sets , and a data generation tool like the above moves us in that direction.
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB