Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Limit vs Sample


Copy link to this message
-
Re: Limit vs Sample
Thank you for the replies. I got the point now. :)

Regards,
Panshul
On Thu, Feb 28, 2013 at 11:08 AM, Prasanth J <[EMAIL PROTECTED]>wrote:

> Sorry, I was confused with RandomSampleLoader which uses reservoir
> sampling.
> SAMPLE is rewritten to filter + less than expression with sampling
> percentage as predicate value.
>
> Thanks
> -- Prasanth
>
> On Feb 28, 2013, at 5:01 AM, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> > LIMIT takes the first X records, so there are no statistical guarantees.
> > SAMPLE takes X% of the records from the whole bag (uniformly), so you
> have
> > statistical guarantees.
> > No, SAMPLE does not use reservoir sampling.
> >
> > Cheers,
> >
> > --
> > Gianmarco
> >
> >
> > On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[EMAIL PROTECTED]
> >wrote:
> >
> >> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads
> >> entire data to randomly generate 10% data.
> >>
> >> Thanks
> >> -- Prasanth
> >>
> >> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> Can somebody please explain me the difference between Limit and Sample
> >>> statements.
> >>> Does it read the entire input file in case of Sample if the value is
> set
> >> to
> >>> 0.1 or it reads randomly only till 10% of the data has been collected.
> >>>
> >>> Thanking You for any help.
> >>>
> >>> --
> >>> Regards,
> >>> Ouch Whisper
> >>> 010101010101
> >>
> >>
>
>
--
Regards,
Ouch Whisper
010101010101