|
|
-
Total count of RandomSampleLoader is unpredicatablePrasanth J 2012-07-27, 01:04
Hello everyone
I am using RandomSampleLoader to load 1000 tuples per mapper. I have 11 map jobs in a small dataset and 109 map jobs in a large dataset. I am expecting 11000 tuples from the small dataset and 109000 tuples from the large dataset. But the actual number of tuples that I get is always more than what I expected. In small dataset case I am getting 15000 tuples whereas in large dataset case I am getting 145000 (sometimes 150000) tuples. Is this a bug? or is it an expected behavior? If reservoir sampling is used by all mappers then why is the number of total samples is more? Thanks -- Prasanth +
Jie Li 2012-07-27, 22:25
|