Assume there is one large data set with size 100G on hdfs, how can I
control that every data sent into each mapper is around 10G and the 10G is
random sampled from the 100G data set? Do we have any mahout sample code
Any comments will be appreciated.