Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> how to force hive to use index for predicates


Copy link to this message
-
how to force hive to use index for predicates
Hi,

I have a table pokes(foo int, bar string) on which I created a compact
index (on foo). I set hive.optimze.autoindex=true and then run the query
select * from pokes where name = 'somename'. However, when I enable
logging, I do not see the phase in which the optimizer makes use of the
index. I am suspecting that it did not use the index to execute the query.
If so, how should I force it to use the index?

Thank you in advance for your reply,
Mahsa

Below is the complete log:

12/05/19 17:01:37 INFO ql.Driver: <PERFLOG method=Driver.run>
12/05/19 17:01:37 INFO ql.Driver: <PERFLOG method=compile>
12/05/19 17:01:37 INFO parse.ParseDriver: Parsing command: select * from
pokes where pokes.bar = 'John'
12/05/19 17:01:37 INFO parse.ParseDriver: Parse Completed
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Completed phase 1 of
Semantic Analysis
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Get metadata for source
tables
12/05/19 17:01:37 INFO metastore.HiveMetaStore: 0: get_table : db=default
tbl=pokes
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Get metadata for subqueries
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Get metadata for destination
tables
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Completed getting MetaData
in Semantic Analysis
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for FS(3)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for SEL(2)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for FIL(1)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Pushdown Predicates of FIL For
Alias : pokes
12/05/19 17:01:37 INFO ppd.OpProcFactory: (bar = 'John')
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for TS(0)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Pushdown Predicates of TS For
Alias : pokes
12/05/19 17:01:37 INFO ppd.OpProcFactory: (bar = 'John')
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO physical.MetadataOnlyOptimizer: Looking for table
scans where optimization is applicable
12/05/19 17:01:37 INFO physical.MetadataOnlyOptimizer: Found 0 metadata
only table scans
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Completed plan generation
12/05/19 17:01:37 INFO ql.Driver: Semantic Analysis Completed
12/05/19 17:01:37 INFO ql.Driver: Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:foo, type:int, comment:null),
FieldSchema(name:bar, type:string, comment:null)], properties:null)
12/05/19 17:01:37 INFO ql.Driver: </PERFLOG method=compile
start=1337461297242 end=1337461297288 duration=46>
12/05/19 17:01:37 INFO ql.Driver: <PERFLOG method=Driver.execute>
12/05/19 17:01:37 INFO ql.Driver: Starting command: select * from pokes
where pokes.bar = 'John'
Total MapReduce jobs = 1
12/05/19 17:01:37 INFO ql.Driver: Total MapReduce jobs = 1
Launching Job 1 out of 1
12/05/19 17:01:37 INFO ql.Driver: Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
12/05/19 17:01:37 INFO exec.Task: Number of reduce tasks is set to 0 since
there's no reduce operator
12/05/19 17:01:37 INFO exec.ExecDriver: Using
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
12/05/19 17:01:37 INFO exec.ExecDriver: adding libjars:
file:///usr/local/hive/build/dist/lib/hive-builtins-0.10.0-SNAPSHOT.jar
12/05/19 17:01:37 INFO exec.ExecDriver: Processing alias pokes
12/05/19 17:01:37 INFO exec.ExecDriver: Adding input file
hdfs://localhost:54310/user/hive/warehouse/pokes
12/05/19 17:01:37 INFO exec.Utilities: Content Summary not cached for
hdfs://localhost:54310/user/hive/warehouse/pokes
12/05/19 17:01:37 INFO exec.ExecDriver: Making Temp Directory:
hdfs://localhost:54310/tmp/hive-hduser/hive_2012-05-19_17-01-37_242_5525578509473503196/-ext-10001
12/05/19 17:01:37 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
12/05/19 17:01:37 INFO io.CombineHiveInputFormat: CombineHiveInputSplit
creating pool for hdfs://localhost:54310/user/hive/warehouse/pokes; using
filter path hdfs://localhost:54310/user/hive/warehouse/pokes
12/05/19 17:01:37 INFO mapred.FileInputFormat: Total input paths to process
12/05/19 17:01:37 INFO io.CombineHiveInputFormat: number of splits 1
Starting Job = job_201205131854_0113, Tracking URL http://localhost:50030/jobdetails.jsp?jobid=job_201205131854_0113
12/05/19 17:01:37 INFO exec.Task: Starting Job = job_201205131854_0113,
Tracking URL http://localhost:50030/jobdetails.jsp?jobid=job_201205131854_0113
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job
 -Dmapred.job.tracker=localhost:54311 -kill job_201205131854_0113
12/05/19 17:01:37 INFO exec.Task: Kill Command /usr/local/hadoop/bin/../bin/hadoop job
 -Dmapred.job.tracker=localhost:54311 -kill job_201205131854_0113
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 0
12/05/19 17:01:43 INFO exec.Task: Hadoop job information for Stage-1:
number of mappers: 1; number of reducers: 0
2012-05-19 17:01:43,764 Stage-1 map = 0%,  reduce = 0%
12/05/19 17:01:43 INFO exec.Task: 2012-05-19 17:01:43,764 Stage-1 map = 0%,
 reduce = 0%
2012-05-19 17:01:46,773 Stage-1 map = 100%,  reduce = 0%
12/05/19 17:01:46 INFO exec.Task: 2012-05-19 17:01:46,773 Stage-1 map 100%,  reduce = 0%
2012-05-19 17:01:49,782 Stage-1 map = 100%,  reduce = 100%
12/05/19 17:01:49 INFO exec.Task: 2012-05-19 17:01:49,782 Stage-1 map 100%,  reduce = 100%
Ended Job = job_201205131854_0113
12/05/19 17:01:49 INFO exec.Task: Ended Job = job_201205131854_0113
12/05/19 17:01:49 INFO exec.FileSinkOperator: Moving tmp dir:
hdfs://localhost:54310/tmp/hive-hduser/hive_2012-05-19_17-01-37_242_5525578509473503196/_tmp.-ex
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB