Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Tablesample doubling


Copy link to this message
-
Tablesample doubling
j.barrett Strausser 2013-07-30, 01:51
Hello All,

Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?
I have the following script:

DROP TABLE IF EXISTS sparse_features_small;

CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' LINES TERMINATED BY '\n' as

SELECT
        *
FROM
        sparse_features
TABLESAMPLE(50000 ROWS)
After I execute this by sourcing the file, I can then execute :

--
https://github.com/bearrito
@deepbearrito