Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Index not getting used for the queries


Copy link to this message
-
Re: Index not getting used for the queries
Thanks Peter. It helped. That property combined with setting the property
'hive.optimize.index.filter' to 'true' got the index working.

thanks,
Thilina
On Mon, Feb 3, 2014 at 6:12 PM, Peter Marron <
[EMAIL PROTECTED]> wrote:

>  Hi,
>
>
>
> Not sure if it is relevant to your problem but I'm just checking
>
> that you know about
>
> hive.optimize.index.filter.compact.minsize
>
> it's set to 5Gbytes by default and if the estimated query size is
>
> less than this then the index won't be used.
>
> HTH.
>
>
>
> Regards
>
>
>
> *Peter Marron*
>
> Senior Developer, Research & Development
>
>
>
> Office: +44 *(0) 118-940-7609*  [EMAIL PROTECTED]
>
> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>
>    <https://www.facebook.com/pages/Trillium-Software/109184815778307>
>
>  <https://twitter.com/TrilliumSW>
>
>  <http://www.linkedin.com/company/17710>
>
>
>
> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>
> Be Certain About Your Data. Be Trillium Certain.
>
>
>
> *From:* Thilina Gunarathne [mailto:[EMAIL PROTECTED]]
> *Sent:* 03 February 2014 16:08
> *To:* user
> *Subject:* Index not getting used for the queries
>
>
>
> Dear all,
>
> I created a compact index for a table with several hundred million records
> as follows. The table is partitioned by the month. The index on A and B was
> created successfully, but I can't see it getting used in the queries. It
> would be great if one of you experts can shed some light on what  am I
> missing. I'm using hive 0.9.
>
> set hive.exec.parallel=false;
> CREATE INDEX idx_yyyy
>     ON TABLE yyyy(a,b)
>     AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
>     WITH DEFERRED REBUILD
>     COMMENT 'Index for yyyy table. Indexing on A and B';
> ALTER INDEX idx_yyyy on yyyy REBUILD;
>
>
> hive> describe yyyy;
> OK
> a    bigint
> ...
> b    bigint
> ....
>
> month int
>
>
>
> hive> show index on yyyy;
> OK
> idx_yyyy          yyyy              a, b
> default__yyyy_p_idx_yyyy__    compact                 Index for tm top50
> table. Indexing on A and B
>
>
> hive> explain select a,b from tm_top50_p where a=113231 and
> month=201308;
> OK
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT
> (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
> (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and
> (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))
>
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
>
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         yyyy
>           TableScan
>             alias: yyyy
>             Filter Operator
>               predicate:
>                   expr: (a = 113231)
>                   type: boolean
>               Select Operator
>                 expressions:
>                       expr: a
>                       type: bigint
>                       expr: b
>                       type: bigint
>                 outputColumnNames: _col0, _col1
>                 File Output Operator
>                   compressed: false
>                   GlobalTableId: 0
>                   table:
>                       input format:
> org.apache.hadoop.mapred.TextInputFormat
>                       output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>
>   thanks a lot,
> Thilina
>
>
> --
> https://www.cs.indiana.edu/~tgunarat/
> http://www.linkedin.com/in/thilina
>
> http://thilina.gunarathne.org
>

--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org