Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> hive view filter pushdown quesiton


Copy link to this message
-
hive view filter pushdown quesiton
So I have an interesting situation where a select from a view - amazingly i
might add - does the right partition elimination in hive v0.80 but in hive
v0.10 it does not.  instead it seems to perform like a mere mortal would.

Let me explain:

1. view definition:
         create view v1 as select cast(year*10000 + month*100 + day as int)
as date_key, * from big_table_with_partitions;

2. big_table_with_partitions is partitioned by (year int, month int, day
int, hour int)

3. the query.  select * from v1 where date_key=20130429 limit 10;

on hive 0.80 running explain we have this:

{code}
|       Path -> Alias:
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=01[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=02[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=03[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=04[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=05[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=06[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=07[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=08[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=09[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=10[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=11[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=12[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=13[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=14[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=15[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=16[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=17[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=18[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=19[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=20[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=21[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=22[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=23[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
|       Path -> Partition:
|         hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00
|           Partition
|             base file name: hour=00
|             input format: org.apache.hadoop.mapred.SequenceFileInputFormat
|             output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
|             partition values:
|               day 29
|               hour 00
|               month 04
|               year 2013
{code}

so hive somehow was able to parse the "date_key=20130429" and figure out
that was partition (year=2013/month=04/day=29).

in hive version 0.10 it does not and lets just say i have several years of
data and after about 30 min
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB