LIMIT takes the first X records, so there are no statistical guarantees.
SAMPLE takes X% of the records from the whole bag (uniformly), so you have
No, SAMPLE does not use reservoir sampling.
On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[EMAIL PROTECTED]>wrote:
> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads
> entire data to randomly generate 10% data.
> -- Prasanth
> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]>
> > Hello,
> > Can somebody please explain me the difference between Limit and Sample
> > statements.
> > Does it read the entire input file in case of Sample if the value is set
> > 0.1 or it reads randomly only till 10% of the data has been collected.
> > Thanking You for any help.
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101