Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Tablesample doubling


Copy link to this message
-
Re: Tablesample doubling
Nevermind I see in the docs, it is rows PER SPLIT.

-b
On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser <
[EMAIL PROTECTED]> wrote:

> SELECT COUNT(*) FROM sparse_features_small;
>
> And I receive back :
>
> Total MapReduce CPU Time Spent: 3 seconds 330 msec
> OK
> 100000
>
> Rather than the expected 50000
>
> I am running hive 11.2
>
>
>
>
> On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser <
> [EMAIL PROTECTED]> wrote:
>
>> Hello All,
>>
>> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?
>>
>>
>> I have the following script:
>>
>> DROP TABLE IF EXISTS sparse_features_small;
>>
>> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
>> BY ',' LINES TERMINATED BY '\n' as
>>
>> SELECT
>>         *
>> FROM
>>         sparse_features
>> TABLESAMPLE(50000 ROWS)
>>
>>
>> After I execute this by sourcing the file, I can then execute :
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>

--
https://github.com/bearrito
@deepbearrito