Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - question about schema


Copy link to this message
-
Re: question about schema
David Alves 2013-04-22, 04:06
Hi Lisen

Phoenix has been a good source of inspiration.
Had it not been for license issues (non-standard license) and the fact it is designed to run locally I would have used it directly instead of coding my own.
Not completely sure what you mean wrt to "map fields in the query into portion of rowkey in HBase" but here's what I'm doing with regard to the operations that are pushed to HBase:

Projection comes from setting the interesting CF's and CQ's in the Scan prior to starting it (where those come from in drill was the reason for my previous email).
Selection comes from setting Filters that are created directly form expresssions in drlll and are submitted with the scan.
Partial Aggregation (which I'm not doing right now but will do soon ) will come from co-processors.
Joins: I'm investigating a couple on pushing some of the work to hbase.

All the remaining operations will happen within drill itself.

Best
David

On Apr 21, 2013, at 10:45 PM, Lisen Mu <[EMAIL PROTECTED]> wrote:

> David,
>
> Another case about schema: how to map fields in the query into portion of
> rowkey in HBase? Like phoenix does.
> http://files.meetup.com/1350427/IntelPhoenixHBaseMeetup.ppt
>
> I think it might be common in HBase schema design that several logical
> parts form rowkey in a particular order for the most frequent access
> pattern.
>
>
>
>
> On Sun, Apr 21, 2013 at 1:45 PM, David Alves <[EMAIL PROTECTED]> wrote:
>
>> had a "duh" moment, realizing that, of course, I don't need a
>> ProjectFilter as I can set the relevant cq's and cf's on HBase's Scan.
>> the question or how to get the names of the columns the query is asking
>> for or even "*" if that is the case, still stands though…
>>
>> -david
>>
>> On Apr 20, 2013, at 10:39 PM, David Alves <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Jacques
>>>
>>>      I'm implementing a ProjectFilter for HBase and I got to the point
>> where I need to pass to HBase the fields that are required (even if it's
>> simply "all" as in *).
>>>      How to know which fields to scan in the SE and their expected type?
>>>      There's a bunch of schema stuff in the
>> org/apache/drill/exec/schema but I can't figure how SE uses that.
>>>      Will this info come inside the scan logical op in
>> getReadEntries(Scan scan) (in the arbitrary "selection" section)?
>>>      Is this method still going to receive a logical Scan op or is this
>> just a legacy stuff that you didn't have the chance to get to yet?
>>>      BatchSchema seems to only refer to field ids…
>>>
>>>      I'm thinking this is most likely because the work is still very
>> much in progress but as I browse the code I can see you have put a lot of
>> thought into almost everything even when it's not being used right now and
>> I don't want to make any stupid assumption.
>>>      I can definitely make that info get to the SE iface myself just
>> wondering how do you envision it should get there…
>>>
>>> Best
>>> David
>>>
>>>
>>>
>>>
>>
>>