Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Limit vs Sample


+
Panshul Whisper 2013-02-26, 23:19
Copy link to this message
-
Re: Limit vs Sample
AFAIK, SAMPLE operator internally uses reservoir sampling. So it reads entire data to randomly generate 10% data.

Thanks
-- Prasanth

On Feb 26, 2013, at 6:19 PM, Panshul Whisper <[EMAIL PROTECTED]> wrote:

> Hello,
>
> Can somebody please explain me the difference between Limit and Sample
> statements.
> Does it read the entire input file in case of Sample if the value is set to
> 0.1 or it reads randomly only till 10% of the data has been collected.
>
> Thanking You for any help.
>
> --
> Regards,
> Ouch Whisper
> 010101010101

+
Gianmarco De Francisci Mo... 2013-02-28, 10:01
+
Prasanth J 2013-02-28, 10:08
+
Panshul Whisper 2013-02-28, 10:41