Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Limit vs Sample


+
Panshul Whisper 2013-02-26, 23:19
+
Prasanth J 2013-02-26, 23:23
+
Gianmarco De Francisci Mo... 2013-02-28, 10:01
+
Prasanth J 2013-02-28, 10:08
Copy link to this message
-
Re: Limit vs Sample
Thank you for the replies. I got the point now. :)

Regards,
Panshul
On Thu, Feb 28, 2013 at 11:08 AM, Prasanth J <[EMAIL PROTECTED]>wrote:

> Sorry, I was confused with RandomSampleLoader which uses reservoir
> sampling.
> SAMPLE is rewritten to filter + less than expression with sampling
> percentage as predicate value.
>
> Thanks
> -- Prasanth
>
> On Feb 28, 2013, at 5:01 AM, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> > LIMIT takes the first X records, so there are no statistical guarantees.
> > SAMPLE takes X% of the records from the whole bag (uniformly), so you
> have
> > statistical guarantees.
> > No, SAMPLE does not use reservoir sampling.
> >
> > Cheers,
> >
> > --
> > Gianmarco
> >
> >
> > On Wed, Feb 27, 2013 at 12:23 AM, Prasanth J <[EMAIL PROTECTED]
> >wrote:
> >
> >> AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads
> >> entire data to randomly generate 10% data.
> >>
> >> Thanks
> >> -- Prasanth
> >>
> >> On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> Can somebody please explain me the difference between Limit and Sample
> >>> statements.
> >>> Does it read the entire input file in case of Sample if the value is
> set
> >> to
> >>> 0.1 or it reads randomly only till 10% of the data has been collected.
> >>>
> >>> Thanking You for any help.
> >>>
> >>> --
> >>> Regards,
> >>> Ouch Whisper
> >>> 010101010101
> >>
> >>
>
>
--
Regards,
Ouch Whisper
010101010101
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB