Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> question about schema


Copy link to this message
-
Re: question about schema
Lisen

Ah, got what you mean by encoding mutliple fields into rowkey.
Well that makes projection trickier, but still definitely possible to do with Filters.
As soon as I get something reasonable working I'll push it and I welcome your help in dealing with that particular situation and any others you can come up with.

With regard to pushdown after a bit of the discussion in the SE jira (I forget the number) the consensus seems to be that the SE advertises opaque OptimizerRules that the optimizer runs.
These can for instance, push the project in Jacques example inside the scan, or change the order of ops.
In general I can see the case where a typical RDBMS would publish multiple rules (for agg, proj, select, even join) which, when run by the optimizer would go through the ops directly above the scan and keep pushing most inside the scan until there is either nothing left but the sink and the scan (and not even the sink if it goes into the same data source) or there's a multi-branch multi-data source op such as union or join.
All of there are inside the Scan physical op (and are SE agnostic up to this point).
So the physical plan portion to be executed by the SE is actually inside the scan op.
At least this is how I'm thinking about it right now…

Best
David


On Apr 21, 2013, at 11:29 PM, Lisen Mu <[EMAIL PROTECTED]> wrote:

> David,
>
> Suppose we have planned to use domainId+uid+timestamp as my HTable rowkey.
>
> I wish to retrieve uid portion from my rowkey, like:
>
>  SELECT distinct(uid) from `my_table` where xxx
>
> Or, I wish I can do:
>
>  a) SELECT xxx from `my_table` where domainId='a'
>  b) SELECT xxx from `my_table` where uid='[EMAIL PROTECTED]'
>
> And HBase SE would determine the best startKey and endKey according to
> rowkey definition info, so a) and b) would get different performance.
>
>> about selection/Filter & aggregation:
>
> I have too many questions that I feel it be better to wait your HBase SE
> first... However:
>
> How to push down aggregation and selection into scan pop?
>
> @Jacques, It seems to me that your idea is to use a scan pop node to
> describe what SE would do in a query, right?
>
> Would scan pop become a little too complicated if scan pop stay SE
> independent? Since mysql & mongo need more for scan pop.
>
> Previously I thought you would provide something like
>
>  RecordReader getReader(PhysicalPlan subPlan)
>
> SE advertises ability back to drill, drill push part of physical plan to SE
> and let SE figure out how to deal with the subdag as long as SE can provide
> correct RecordBatch.
>
>
>
>
>
> On Mon, Apr 22, 2013 at 12:06 PM, David Alves <[EMAIL PROTECTED]> wrote:
>
>> Hi Lisen
>>
>>        Phoenix has been a good source of inspiration.
>>        Had it not been for license issues (non-standard license) and the
>> fact it is designed to run locally I would have used it directly instead of
>> coding my own.
>>        Not completely sure what you mean wrt to "map fields in the query
>> into portion of rowkey in HBase" but here's what I'm doing with regard to
>> the operations that are pushed to HBase:
>>
>>        Projection comes from setting the interesting CF's and CQ's in the
>> Scan prior to starting it (where those come from in drill was the reason
>> for my previous email).
>>        Selection comes from setting Filters that are created directly
>> form expresssions in drlll and are submitted with the scan.
>>        Partial Aggregation (which I'm not doing right now but will do
>> soon ) will come from co-processors.
>>        Joins: I'm investigating a couple on pushing some of the work to
>> hbase.
>>
>>        All the remaining operations will happen within drill itself.
>>
>> Best
>> David
>>
>> On Apr 21, 2013, at 10:45 PM, Lisen Mu <[EMAIL PROTECTED]> wrote:
>>
>>> David,
>>>
>>> Another case about schema: how to map fields in the query into portion of
>>> rowkey in HBase? Like phoenix does.
>>> http://files.meetup.com/1350427/IntelPhoenixHBaseMeetup.ppt