Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Limit vs Sample


+
Panshul Whisper 2013-02-26, 23:19
Copy link to this message
-
Re: Limit vs Sample
AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads entire data to randomly generate 10% data.

Thanks
-- Prasanth

On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]> wrote:

> Hello,
>
> Can somebody please explain me the difference between Limit and Sample
> statements.
> Does it read the entire input file in case of Sample if the value is set to
> 0.1 or it reads randomly only till 10% of the data has been collected.
>
> Thanking You for any help.
>
> --
> Regards,
> Ouch Whisper
> 010101010101

+
Gianmarco De Francisci Mo... 2013-02-28, 10:01
+
Prasanth J 2013-02-28, 10:08
+
Panshul Whisper 2013-02-28, 10:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB