Peter Marron 2013-05-15, 10:38
Owen O'Malley 2013-05-15, 17:35
Peter Marron 2013-05-16, 14:08
>>On Wed, May 15, 2013 at 3:38 AM, Peter Marron <[EMAIL PROTECTED]> wrote:
>I've started doing similar work for the ORC reader.
I guess that I’m glad that I’m not completely alone here.
>>Firstly although that page mentions InputFormat there doesn’t seem to be any way (that I can find)
>>to perform filter passing to InputFormats and so I gave up on that approach.
>There is. You just need to set hive.optimize.index.filter to true. See https://issues.apache.org/jira/browse/HIVE-4242.
This is a little confusing. When I look through the code for the use of this configuration
I see that it’s effectively used in two places.
Firstly it’s used on line 55 of file PhysicalOptimizer.java to add a “IndexWhereResolver”
Secondly it’s used on line 766 of file OpProcFactory.java to set a filter expression
But I don’t see any point where the predicate is passed to the InputFormat class.
I guess that you’re saying that there’s some way that the InputFormat can retrieve the
predicate once it’s been stored. But it’s not clear to me how I do that.
>>That said, we really need to create a better interface that allows inputformats to negotiate what parts of the predicate they can process.
Ah, yes, sorry. I really want to be able to remove part of the predicate and subsume the filtering into the InputFormat class.
There’s little point in me going down this route if I can’t do that.
Thanks for prodding me into looking at the code, because now I see a big problem.
To recap what I really want to do is to be able to effect filtering on the case where I do a
select * from table;
query. This is the only query that I’m interested in because it seems to run without any
Map/Reduce overhead (either locally or in the cluster) it’s effectively just performing
some HDFS calls and that’s what I desire.
What I really want to be able to do is to issue a query like this:
select * from table where <predicate>
where I filter out the predicate and do the filtering in the InputFormat and then hive
effectively sees the query
select * from table;
and runs it directly (no Map/Reduce) and I’m a happy bunny.
Now, as I say, I can’t see any way to effect this in the InputFormat directly.
If I use a storage handler then I am in “non-native table” terrority and I
can’t LOAD my tables with data.
However I have just noticed that line 111 of file IndexWhereProcessor.java
seems to suggest that indexes are only ever user when the query is going
to run Map/Reduce. Is this so? So I seem to be in the position where I
can’t use InputFormat, StorageHandler or Indexes. What can I do?
Is there any way to filter the query without having to run Map/Reduce?
Any suggestions welcomed.
Trillium Software UK Limited
Tel : +44 (0) 118 940 7609
Fax : +44 (0) 118 940 7699
E: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Peter Marron 2013-05-19, 22:11
Owen O'Malley 2013-05-20, 04:36