|
|
Panshul Whisper 2013-02-26, 23:19
Hello,
Can somebody please explain me the difference between Limit and Sample statements. Does it read the entire input file in case of Sample if the value is set to 0.1 or it reads randomly only till 10% of the data has been collected.
Thanking You for any help.
-- Regards, Ouch Whisper 010101010101
Prasanth J 2013-02-26, 23:23
AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads entire data to randomly generate 10% data.
Thanks -- Prasanth
On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]> wrote:
> Hello, > > Can somebody please explain me the difference between Limit and Sample > statements. > Does it read the entire input file in case of Sample if the value is set to > 0.1 or it reads randomly only till 10% of the data has been collected. > > Thanking You for any help. > > -- > Regards, > Ouch Whisper > 010101010101
Gianmarco De Francisci Mo... 2013-02-28, 10:01
Hi, LIMIT takes the first X records, so there are no statistical guarantees. SAMPLE takes X% of the records from the whole bag (uniformly), so you have statistical guarantees. No, SAMPLE does not use reservoir sampling.
Cheers,
-- Gianmarco On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[EMAIL PROTECTED]>wrote:
> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads > entire data to randomly generate 10% data. > > Thanks > -- Prasanth > > On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]> > wrote: > > > Hello, > > > > Can somebody please explain me the difference between Limit and Sample > > statements. > > Does it read the entire input file in case of Sample if the value is set > to > > 0.1 or it reads randomly only till 10% of the data has been collected. > > > > Thanking You for any help. > > > > -- > > Regards, > > Ouch Whisper > > 010101010101 > >
Prasanth J 2013-02-28, 10:08
Sorry, I was confused with RandomSampleLoader which uses reservoir sampling. SAMPLE is rewritten to filter + less than expression with sampling percentage as predicate value.
Thanks -- Prasanth
On Feb 28, 2013, at 5:01 AM, Gianmarco De Francisci Morales <[EMAIL PROTECTED]> wrote:
> Hi, > LIMIT takes the first X records, so there are no statistical guarantees. > SAMPLE takes X% of the records from the whole bag (uniformly), so you have > statistical guarantees. > No, SAMPLE does not use reservoir sampling. > > Cheers, > > -- > Gianmarco > > > On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[EMAIL PROTECTED]>wrote: > >> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads >> entire data to randomly generate 10% data. >> >> Thanks >> -- Prasanth >> >> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]> >> wrote: >> >>> Hello, >>> >>> Can somebody please explain me the difference between Limit and Sample >>> statements. >>> Does it read the entire input file in case of Sample if the value is set >> to >>> 0.1 or it reads randomly only till 10% of the data has been collected. >>> >>> Thanking You for any help. >>> >>> -- >>> Regards, >>> Ouch Whisper >>> 010101010101 >> >>
Panshul Whisper 2013-02-28, 10:41
Thank you for the replies. I got the point now. :)
Regards, Panshul On Thu, Feb 28, 2013 at 11:08 AM, Prasanth J <[EMAIL PROTECTED]>wrote:
> Sorry, I was confused with RandomSampleLoader which uses reservoir > sampling. > SAMPLE is rewritten to filter + less than expression with sampling > percentage as predicate value. > > Thanks > -- Prasanth > > On Feb 28, 2013, at 5:01 AM, Gianmarco De Francisci Morales < > [EMAIL PROTECTED]> wrote: > > > Hi, > > LIMIT takes the first X records, so there are no statistical guarantees. > > SAMPLE takes X% of the records from the whole bag (uniformly), so you > have > > statistical guarantees. > > No, SAMPLE does not use reservoir sampling. > > > > Cheers, > > > > -- > > Gianmarco > > > > > > On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[EMAIL PROTECTED] > >wrote: > > > >> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads > >> entire data to randomly generate 10% data. > >> > >> Thanks > >> -- Prasanth > >> > >> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]> > >> wrote: > >> > >>> Hello, > >>> > >>> Can somebody please explain me the difference between Limit and Sample > >>> statements. > >>> Does it read the entire input file in case of Sample if the value is > set > >> to > >>> 0.1 or it reads randomly only till 10% of the data has been collected. > >>> > >>> Thanking You for any help. > >>> > >>> -- > >>> Regards, > >>> Ouch Whisper > >>> 010101010101 > >> > >> > > -- Regards, Ouch Whisper 010101010101
|
|