Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # dev >> Bigtop: generating fake data


Copy link to this message
-
Bigtop: generating fake data
Hi bigtop.  Are we interested in maintaining our own infra for generating fake data , rather than relying on and downloading external data sources for smokes?  Fake data is great for testing I think...  

In bigpetstore I'm generating fake data , written a lot of code to do this in the custom input formats.... but I just found :

http://codearte.github.io/jfairy/

Which is a groovy tool for doing the same....

  I wonder wether generating fake data for testing big data should be a first-class part of bigtop ?  Would others use a utility or just me ?

It might be another useful artifact for the community especially for bigpetstore but also for testing a variety of other machine learning related projects....

I think it's bad to rely on external websites for our tests, maybe in time we could move over to our in internally curated/generated data sets , and a data generation tool like the above moves us in that direction.
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB