Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Tablesample doubling


Copy link to this message
-
Re: Tablesample doubling
Nevermind I see in the docs, it is rows PER SPLIT.

-b
On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser <
[EMAIL PROTECTED]> wrote:

> SELECT COUNT(*) FROM sparse_features_small;
>
> And I receive back :
>
> Total MapReduce CPU Time Spent: 3 seconds 330 msec
> OK
> 100000
>
> Rather than the expected 50000
>
> I am running hive 11.2
>
>
>
>
> On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser <
> [EMAIL PROTECTED]> wrote:
>
>> Hello All,
>>
>> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?
>>
>>
>> I have the following script:
>>
>> DROP TABLE IF EXISTS sparse_features_small;
>>
>> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
>> BY ',' LINES TERMINATED BY '\n' as
>>
>> SELECT
>>         *
>> FROM
>>         sparse_features
>> TABLESAMPLE(50000 ROWS)
>>
>>
>> After I execute this by sourcing the file, I can then execute :
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>

--
https://github.com/bearrito
@deepbearrito
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB