-Re: Limit vs Sample
Prasanth J 2013-02-28, 10:08
Sorry, I was confused with RandomSampleLoader which uses reservoir sampling.
SAMPLE is rewritten to filter + less than expression with sampling percentage as predicate value.
On Feb 28, 2013, at 5:01 AM, Gianmarco De Francisci Morales <[EMAIL PROTECTED]> wrote:
> LIMIT takes the first X records, so there are no statistical guarantees.
> SAMPLE takes X% of the records from the whole bag (uniformly), so you have
> statistical guarantees.
> No, SAMPLE does not use reservoir sampling.
> On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[EMAIL PROTECTED]>wrote:
>> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads
>> entire data to randomly generate 10% data.
>> -- Prasanth
>> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]>
>>> Can somebody please explain me the difference between Limit and Sample
>>> Does it read the entire input file in case of Sample if the value is set
>>> 0.1 or it reads randomly only till 10% of the data has been collected.
>>> Thanking You for any help.
>>> Ouch Whisper