|
Raghunath, Ranjith
2012-05-16, 20:06
Hezhiqiang
2012-05-17, 00:32
Carl Steinbach
2012-05-17, 00:46
Mark Grover
2012-05-17, 00:52
Raghunath, Ranjith
2012-05-17, 01:46
Mark Grover
2012-05-17, 02:27
Ranjith
2012-05-17, 02:42
|
-
Indexing in hiveRaghunath, Ranjith 2012-05-16, 20:06
I am currently using hive 0.7.1 and creating indexes based on columns in the where clause. However, when I run the explain plan I do not see the index being leveraged. The syntax that I am using to build the index is as follows:
CREATE INDEX x ON TABLE t(j) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD The query is as follows: Select a,count(*) from x where j='and' group by a Is there anything I am doing incorrectly? Thanks, Ranjith
-
RE: Indexing in hiveHezhiqiang 2012-05-17, 00:32
“hive.optimize.index.filter” is the conf automatically use indexes
If u set hive.optimize.index.groupby = true. It will set hive.optimize.index.filter=false. See your configurations. And you need to build index after create index. Best regards Ransom.
-
Re: Indexing in hiveCarl Steinbach 2012-05-17, 00:46
Hi Ranjith,
Hive 0.7 supports the ability to build indexes, but the query compiler in 0.7 doesn't know how to optimize queries with these indexes. Hive 0.8 was the first release to include some support for optimizing query plans with indexes, and that only applies to GROUP BY and WHERE clauses under certain specific circumstances. At this point index support in Hive is very much a work in progress. Thanks. Carl On Wed, May 16, 2012 at 1:06 PM, Raghunath, Ranjith < [EMAIL PROTECTED]> wrote: > I am currently using hive 0.7.1 and creating indexes based on columns in > the where clause. However, when I run the explain plan I do not see the > index being leveraged. The syntax that I am using to build the index is as > follows: > > CREATE INDEX x ON TABLE t(j) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > WITH DEFERRED REBUILD > > The query is as follows: > > Select a,count(*) from x where j=’and’ group by a > > Is there anything I am doing incorrectly? > > Thanks, > Ranjith > > > >
-
Re: Indexing in hiveMark Grover 2012-05-17, 00:52
Ransom,
>From this JIRA (https://issues.apache.org/jira/browse/HIVE-1644), it looks like automatic use of indexes using hive.optimize.index.filter was introduced in Hive 0.8. However, Ranjith seems to be using Hive 0.7.1 which doesn't support those properties. Ranjith, you need to set the appropriate priorities before calling your query to make use of indexes. Are you setting any properties? If so, what? I will try to dig up what those properties are, in the meanwhile. Mark ----- Original Message ----- From: "Hezhiqiang (Ransom)" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "Zhaojun (Terry)" <[EMAIL PROTECTED]> Sent: Wednesday, May 16, 2012 8:32:55 PM Subject: RE: Indexing in hive “ hive.optimize.index.filter ” is the conf automatically use indexes If u set hive.optimize.index.groupby = true. It will set hive.optimize.index.filter =false. See your configurations. And you need to build index after create index. Best regards Ransom.
-
Re: Indexing in hiveRaghunath, Ranjith 2012-05-17, 01:46
Thanks Mark, Carl, and Ransom. I really appreciate the answers here. I am using Hive 0.7.1 and currently trying to create an index to help with performance associated to a particular where clause. I have not set any properties as mentioned below. I can try the options you listed below. Based on the feedback, I do have a few questions:
1. It seems like if you optimize for a groupby you cannot optimize for the where clause. I am reading this correctly? 2. How do you build the index? 3. Does the build process create a dataset that contains the index keys along with an offset value associated with the row? 4. And I am guessing that you need to use the columns in the same order as it is defined in the index? Thanks, Ranjith ----- Original Message ----- From: Mark Grover [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 16, 2012 07:52 PM To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Cc: Zhaojun (Terry) <[EMAIL PROTECTED]> Subject: Re: Indexing in hive Ransom, From this JIRA (https://issues.apache.org/jira/browse/HIVE-1644), it looks like automatic use of indexes using hive.optimize.index.filter was introduced in Hive 0.8. However, Ranjith seems to be using Hive 0.7.1 which doesn't support those properties. Ranjith, you need to set the appropriate priorities before calling your query to make use of indexes. Are you setting any properties? If so, what? I will try to dig up what those properties are, in the meanwhile. Mark ----- Original Message ----- From: "Hezhiqiang (Ransom)" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "Zhaojun (Terry)" <[EMAIL PROTECTED]> Sent: Wednesday, May 16, 2012 8:32:55 PM Subject: RE: Indexing in hive “ hive.optimize.index.filter ” is the conf automatically use indexes If u set hive.optimize.index.groupby = true. It will set hive.optimize.index.filter =false. See your configurations. And you need to build index after create index. Best regards Ransom.
-
Re: Indexing in hiveMark Grover 2012-05-17, 02:27
Hi Ranjith,
Here are the steps for using an index in Hive 0.7.1. 1) Create the index CREATE INDEX x ON TABLE t(j) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; 2) Build the index (since you specified the 'DEFERRED REBUILD' flag in the create index statement ALTER INDEX x ON t REBUILD; 3) Use the index If your original query is: select a, count(*) from t where j='and' group by a; and you wanted to use the index on column j, do something like: INSERT OVERWRITE DIRECTORY '/tmp/indexes/x' SELECT `_bucketname`, `_offsets` FROM default__t_x__ where j='and'; (The name default__t_x__ can be found in the output of step 2. Also, /tmp/indexes directory needs to exist in HDFS. You can substitute this to be any pre-existing directory in HDFS) SET hive.index.compact.file=/tmp/indexes/x; SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; SELECT a, count(*) from t where j='and' group by a; Since the semantics of this usage make you specify the compact file, I have not been able to figure out a way to use multiple indexes in the same query. In this case we are using the index on j, the column in the where clause. I hope you now understand why indexing in Hive is a work in progress:-) Good luck! Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com ----- Original Message ----- From: "Ranjith Raghunath" <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Wednesday, May 16, 2012 9:46:23 PM Subject: Re: Indexing in hive Thanks Mark, Carl, and Ransom. I really appreciate the answers here. I am using Hive 0.7.1 and currently trying to create an index to help with performance associated to a particular where clause. I have not set any properties as mentioned below. I can try the options you listed below. Based on the feedback, I do have a few questions: 1. It seems like if you optimize for a groupby you cannot optimize for the where clause. I am reading this correctly? 2. How do you build the index? 3. Does the build process create a dataset that contains the index keys along with an offset value associated with the row? 4. And I am guessing that you need to use the columns in the same order as it is defined in the index? Thanks, Ranjith ----- Original Message ----- From: Mark Grover [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 16, 2012 07:52 PM To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Cc: Zhaojun (Terry) <[EMAIL PROTECTED]> Subject: Re: Indexing in hive Ransom, >From this JIRA (https://issues.apache.org/jira/browse/HIVE-1644), it looks like automatic use of indexes using hive.optimize.index.filter was introduced in Hive 0.8. However, Ranjith seems to be using Hive 0.7.1 which doesn't support those properties. Ranjith, you need to set the appropriate priorities before calling your query to make use of indexes. Are you setting any properties? If so, what? I will try to dig up what those properties are, in the meanwhile. Mark ----- Original Message ----- From: "Hezhiqiang (Ransom)" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: "Zhaojun (Terry)" <[EMAIL PROTECTED]> Sent: Wednesday, May 16, 2012 8:32:55 PM Subject: RE: Indexing in hive “ hive.optimize.index.filter ” is the conf automatically use indexes If u set hive.optimize.index.groupby = true. It will set hive.optimize.index.filter =false. See your configurations. And you need to build index after create index. Best regards Ransom.
-
Re: Indexing in hiveRanjith 2012-05-17, 02:42
Thanks Mark. I going to give this a shot a get back with some results.
Thanks, Ranjith On May 16, 2012, at 9:27 PM, Mark Grover <[EMAIL PROTECTED]> wrote: > Hi Ranjith, > Here are the steps for using an index in Hive 0.7.1. > > 1) Create the index > CREATE INDEX x ON TABLE t(j) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > WITH DEFERRED REBUILD; > > 2) Build the index (since you specified the 'DEFERRED REBUILD' flag in the create index statement > > ALTER INDEX x ON t REBUILD; > > 3) Use the index > If your original query is: select a, count(*) from t where j='and' group by a; > and you wanted to use the index on column j, do something like: > > INSERT OVERWRITE DIRECTORY '/tmp/indexes/x' SELECT `_bucketname`, `_offsets` FROM default__t_x__ where j='and'; > (The name default__t_x__ can be found in the output of step 2. Also, /tmp/indexes directory needs to exist in HDFS. You can substitute this to be any pre-existing directory in HDFS) > SET hive.index.compact.file=/tmp/indexes/x; > SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; > SELECT a, count(*) from t where j='and' group by a; > > Since the semantics of this usage make you specify the compact file, I have not been able to figure out a way to use multiple indexes in the same query. In this case we are using the index on j, the column in the where clause. > > I hope you now understand why indexing in Hive is a work in progress:-) > > Good luck! > Mark > > > Mark Grover, Business Intelligence Analyst > OANDA Corporation > > www: oanda.com www: fxtrade.com > > ----- Original Message ----- > From: "Ranjith Raghunath" <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Wednesday, May 16, 2012 9:46:23 PM > Subject: Re: Indexing in hive > > Thanks Mark, Carl, and Ransom. I really appreciate the answers here. I am using Hive 0.7.1 and currently trying to create an index to help with performance associated to a particular where clause. I have not set any properties as mentioned below. I can try the options you listed below. Based on the feedback, I do have a few questions: > > 1. It seems like if you optimize for a groupby you cannot optimize for the where clause. I am reading this correctly? > > 2. How do you build the index? > > 3. Does the build process create a dataset that contains the index keys along with an offset value associated with the row? > > 4. And I am guessing that you need to use the columns in the same order as it is defined in the index? > > > Thanks, > Ranjith > > ----- Original Message ----- > From: Mark Grover [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, May 16, 2012 07:52 PM > To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> > Cc: Zhaojun (Terry) <[EMAIL PROTECTED]> > Subject: Re: Indexing in hive > > Ransom, > From this JIRA (https://issues.apache.org/jira/browse/HIVE-1644), it looks like automatic use of indexes using hive.optimize.index.filter was introduced in Hive 0.8. However, Ranjith seems to be using Hive 0.7.1 which doesn't support those properties. > > Ranjith, you need to set the appropriate priorities before calling your query to make use of indexes. Are you setting any properties? If so, what? > I will try to dig up what those properties are, in the meanwhile. > > Mark > > ----- Original Message ----- > From: "Hezhiqiang (Ransom)" <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: "Zhaojun (Terry)" <[EMAIL PROTECTED]> > Sent: Wednesday, May 16, 2012 8:32:55 PM > Subject: RE: Indexing in hive > > > > > “ hive.optimize.index.filter ” is the conf automatically use indexes > > If u set hive.optimize.index.groupby = true. > > It will set hive.optimize.index.filter =false. > > See your configurations. > > > > And you need to build index after create index. > > > > > Best regards > > Ransom. > |