Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - hive view filter pushdown quesiton


Copy link to this message
-
hive view filter pushdown quesiton
Stephen Sprague 2013-05-02, 20:37
So I have an interesting situation where a select from a view - amazingly i
might add - does the right partition elimination in hive v0.80 but in hive
v0.10 it does not.  instead it seems to perform like a mere mortal would.

Let me explain:

1. view definition:
         create view v1 as select cast(year*10000 + month*100 + day as int)
as date_key, * from big_table_with_partitions;

2. big_table_with_partitions is partitioned by (year int, month int, day
int, hour int)

3. the query.  select * from v1 where date_key=20130429 limit 10;

on hive 0.80 running explain we have this:

{code}
|       Path -> Alias:
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=01[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=02[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=03[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=04[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=05[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=06[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=07[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=08[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=09[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=10[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=11[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=12[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=13[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=14[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=15[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=16[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=17[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=18[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=19[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=20[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=21[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=22[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=23[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|       Path -> Partition:
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00
|           Partition
|             base file name: hour=00
|             input format: org.apache.hadoop.mapred.SequenceFileInputFormat
|             output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
|             partition values:
|               day 29
|               hour 00
|               month 04
|               year 2013
{code}

so hive somehow was able to parse the "date_key=20130429" and figure out
that was partition (year=2013/month=04/day=29).

in hive version 0.10 it does not and lets just say i have several years of
data and after about 30 min