Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Override COUNT() function


Copy link to this message
-
Re: Override COUNT() function
Navis류승우 2013-07-03, 01:56
As you expected, there is no documentation on it (like other optimizers)

Javadoc of the class might be helpful but seemed not in detail enough.

2013/7/2 Peter Marron <[EMAIL PROTECTED]>:
> Thanks Navis,
>
> This is a very interesting class which I feel pretty sure that I would never have found.
> Are  there any descriptions, motivations, documentation or examples anywhere?
> I suspect that there's nothing other than the source itself, but I had to ask.
>
> Regards,
>
> Z
> -----Original Message-----
> From: Navis류승우 [mailto:[EMAIL PROTECTED]]
> Sent: 02 July 2013 08:50
> To: [EMAIL PROTECTED]
> Subject: Re: Override COUNT() function
>
> MetadataOnlyOptimizer changes GBY on partition columns to simple TableScan with one line dummy.
>
> I think similar things can be done with stats.
>
> 2013/6/28 Peter Marron <[EMAIL PROTECTED]>:
>> Hi,
>>
>>
>>
>> I feel sure that someone has asked for this before, but here goes…
>>
>>
>>
>> In the case where I have the query
>>
>>
>>
>>                 SELECT COUNT(*) FROM table;
>>
>>
>>
>> There are many cases where I can determine the count immediately.
>>
>> (For example if I have run something like:
>>
>>
>>
>> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2],
>> ...)] COMPUTE STATISTICS [noscan];
>>
>>
>>
>> then there seems to be a table property “numRows” which holds a count
>> of the number of rows.
>>
>> Now I know that the COUNT function can’t always be determined easily.
>>
>> If the query is more complicated, like
>>
>>
>>
>>                 SELECT COUNT(*) FROM table GROUP BY column;
>>
>>
>>
>> then obviously a simple scalar count is of no real use. But is there
>> some way
>>
>> to intercept the simple case and avoid running a table scan?
>>
>>
>>
>> One problem that I see is that the COUNT function is a UDAF and I am
>>
>> assuming that the presence of any aggregate function like this is
>> enough
>>
>> to force the query planner to require a Map/Reduce. Is there anyway
>>
>> to make the function look like a simple UDF for some queries? Or
>>
>> just for some tables? I guess that I’d be prepared to sacrifice the
>> full
>>
>> generality of the normal COUNT function for one which
>>
>> only functions correctly for the simple query on my tables.
>>
>>
>>
>> So is it possible to have a different COUNT function only on certain tables?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Z
>>
>>
>