zuohua zhang 2012-08-14, 22:18
Raihan Jamal 2012-08-14, 22:23
Roberto Sanabria 2012-08-14, 22:31
-Re: how to do random sampling in hive?
Bejoy KS 2012-08-15, 16:04
To get more accurate sampling, you need to bucketize your table based on the columns you wish to use in sampling. Also use the TABLESAMPLE clause while getting the required sample size in your queries.
Sent from handheld, please excuse typos.
From: Roberto Sanabria <[EMAIL PROTECTED]>
Date: Tue, 14 Aug 2012 15:31:14
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: how to do random sampling in hive?
select * from table_name order by rand() limit 5;
On Tue, Aug 14, 2012 at 3:23 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:
> I think you can use here LIMIT-
> Limit indicates the number of rows to be returned. The rows returned are
> chosen at random. The following query returns 5 rows from t1 at random.
> SELECT * FROM t1 LIMIT 5
> *Raihan Jamal*
> On Tue, Aug 14, 2012 at 3:18 PM, zuohua zhang <[EMAIL PROTECTED]> wrote:
>> Would like to extract a uniform random sample from a hive table? How
>> should I write the query?