Hive, mail # user - how to feed sample of data to each mapper - 2014-02-27, 06:03
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
how to feed sample of data to each mapper
Assume there is one large data set with size 100G on hdfs, how can I
control that every data sent into each mapper is around 10G and the 10G is
random sampled from the 100G data set? Do we have any mahout sample code
doing this?

Any comments will be appreciated.


NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB