Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> how to feed sample of data to each mapper


Copy link to this message
-
how to feed sample of data to each mapper
Assume there is one large data set with size 100G on hdfs, how can I
control that every data sent into each mapper is around 10G and the 10G is
random sampled from the 100G data set? Do we have any mahout sample code
doing this?

Any comments will be appreciated.

Regards,