Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> hive query doesn't seem to limit itself to partitions based on the WHERE clause


Copy link to this message
-
Re: RE: hive query doesn't seem to limit itself to partitions based on the WHERE clause
Hi Namit,

Hourly_fact is partitioned on dt and hr.

Marc
On Oct 3, 2010 10:00 PM, "Namit Jain" <[EMAIL PROTECTED]> wrote:
> What is your table hourly_fact partitioned on ?
>
> ________________________________________
> From: Marc Limotte [[EMAIL PROTECTED]]
> Sent: Friday, October 01, 2010 2:10 PM
> To: [EMAIL PROTECTED]
> Subject: hive query doesn't seem to limit itself to partitions based on
the WHERE clause
>
> Hi,
>
> From looking at the hive log output, it seems that my job is accessing
many more partitions than it needs to? For example, my query is something
like:
>
> INSERT OVERWRITE TABLE daily_fact
> PARTITION (dt='2010-09-29')
> SELECT
> 20100929 as stamp,
> tagtype,
> country,
> sum(num_requests) AS num_requests
> FROM
> hourly_fact HF
> WHERE
> (HF.dt = '2010-09-29' AND HF.hr > '07' )
> OR (HF.dt = '2010-09-30' AND HF.hr <= '07' )
> GROUP BY
> 20100929, tagtype, country
>
> Based on the WHERE clause, I would expect it to look only at partitions in
the date range 2010-09-29 08:00:00 through 2010-09-30 07:00:00. But, the log
contains entries like:
>
> 10/10/01 19:13:09 INFO exec.ExecDriver: Adding input file
hdfs://ny-prod-hc01:9000/home/hadoop/ala/out/hourly/dt=2010-08-15/hr=10
>
> And many other hours outside my WHERE constraint. I assume this means that
it's processing those directories. The answer still comes out right, but I'm
concerned about the performance.
>
> Would appreciate some help understanding what this means and how to fix
it.
>
> Thanks,
> Marc
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB