Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Index not getting used for the queries


Copy link to this message
-
RE: Index not getting used for the queries
Peter Marron 2014-02-03, 23:13
Hi,

Not sure if it is relevant to your problem but I'm just checking
that you know about
hive.optimize.index.filter.compact.minsize
it's set to 5Gbytes by default and if the estimated query size is
less than this then the index won't be used.
HTH.

Regards

Peter Marron
Senior Developer, Research & Development

Office: +44 (0) 118-940-7609  [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
[cid:[EMAIL PROTECTED]BFC0]

[cid:[EMAIL PROTECTED]BFC0]<https://www.facebook.com/pages/Trillium-Software/109184815778307>

[cid:[EMAIL PROTECTED]BFC0]<https://twitter.com/TrilliumSW>

[cid:[EMAIL PROTECTED]BFC0]<http://www.linkedin.com/company/17710>
www.trilliumsoftware.com<http://www.trilliumsoftware.com/>

Be Certain About Your Data. Be Trillium Certain.

From: Thilina Gunarathne [mailto:[EMAIL PROTECTED]]
Sent: 03 February 2014 16:08
To: user
Subject: Index not getting used for the queries

Dear all,
I created a compact index for a table with several hundred million records as follows. The table is partitioned by the month. The index on A and B was created successfully, but I can't see it getting used in the queries. It would be great if one of you experts can shed some light on what  am I missing. I'm using hive 0.9.

set hive.exec.parallel=false;
CREATE INDEX idx_yyyy
    ON TABLE yyyy(a,b)
    AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
    WITH DEFERRED REBUILD
    COMMENT 'Index for yyyy table. Indexing on A and B';
ALTER INDEX idx_yyyy on yyyy REBUILD;
hive> describe yyyy;
OK
a    bigint
...
b    bigint
....
month int
hive> show index on yyyy;
OK
idx_yyyy          yyyy              a, b                    default__yyyy_p_idx_yyyy__    compact                 Index for tm top50 table. Indexing on A and B
hive> explain select a,b from tm_top50_p where a=113231 and month=201308;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        yyyy
          TableScan
            alias: yyyy
            Filter Operator
              predicate:
                  expr: (a = 113231)
                  type: boolean
              Select Operator
                expressions:
                      expr: a
                      type: bigint
                      expr: b
                      type: bigint
                outputColumnNames: _col0, _col1
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1

thanks a lot,
Thilina

--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org