Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Tablesample doubling


Copy link to this message
-
Re: Tablesample doubling
+1 for documentation.  sometimes it surprises you. :)
On Mon, Jul 29, 2013 at 7:11 PM, j.barrett Strausser <
[EMAIL PROTECTED]> wrote:

> Nevermind I see in the docs, it is rows PER SPLIT.
>
> -b
>
>
> On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser <
> [EMAIL PROTECTED]> wrote:
>
>> SELECT COUNT(*) FROM sparse_features_small;
>>
>> And I receive back :
>>
>> Total MapReduce CPU Time Spent: 3 seconds 330 msec
>> OK
>> 100000
>>
>> Rather than the expected 50000
>>
>> I am running hive 11.2
>>
>>
>>
>>
>> On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hello All,
>>>
>>> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?
>>>
>>>
>>> I have the following script:
>>>
>>> DROP TABLE IF EXISTS sparse_features_small;
>>>
>>> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS
>>> TERMINATED BY ',' LINES TERMINATED BY '\n' as
>>>
>>> SELECT
>>>         *
>>> FROM
>>>         sparse_features
>>> TABLESAMPLE(50000 ROWS)
>>>
>>>
>>> After I execute this by sourcing the file, I can then execute :
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB