Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # user - Re: What do you want out of Apache Drill?


Copy link to this message
-
Re: What do you want out of Apache Drill?
David Alves 2013-03-13, 17:06

@Jacques: +1 on pretty much all you said. I, personally, will be focusing on those as soon as I'm able to get something running.
@Ted: good to know there is no major sentiment against large joins, the required infrastructure for performant large joins should also allow for performant cogroups

-david

On Mar 13, 2013, at 11:42 AM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:

> I have a feeling that large joins will be dealt with sooner rather than
> later (especially with interest and work from people like you).  If you
> look at large queries, things are dominated by large sorts, large joins and
> large group-by aggregations.  We need to make sure those are performant in
> large clusters before we focus on the prettier things.  Hopefully we can
> leverage Google Compute Engine to ensure this.
>
>
>
> On Wed, Mar 13, 2013 at 7:07 AM, David Alves <[EMAIL PROTECTED]> wrote:
>
>> Hi All
>>
>>        Sorry to revive an old thread…
>>        I was going through the list looking for the current stance on
>> joins and I found Ted's answer.
>>        What is the main point behind not doing large joins on Drill?
>>        Is it just simplicity (as in optimizer, etc.) or is there
>> something else?
>>        I mention this because I'm particularly interested in large self
>> joins (I'd can volunteer to work on them myself, of course).
>>        I'm not against leaving them out of any optimizer goals, if one
>> can explicitly select an identity optimizer that will just follow the
>> logical plan, but they are big requirement for me.
>>        Thoughts?
>>
>> Best
>> David
>>
>> On Dec 6, 2012, at 7:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>
>>> Drill is explicitly designed (at this time) with the option of not doing
>>> large joins.  Triple stores pretty much  assume lots of large joins.
>>>
>>> That said, if you could write some suggested typical queries, it would
>> help
>>> the discussion along.  If you could go so far as to translate to a
>> logical
>>> plan, that would be even cooler.
>>>
>>> On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <[EMAIL PROTECTED]> wrote:
>>>
>>>> I would very much be interested in having a SPARQL interface, though I
>> am
>>>> not sure how well Drill will handle many joins.
>>>>
>>>>
>>>> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <[EMAIL PROTECTED]>
>> wrote:
>>>>
>>>>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>>> ...
>>>>>> 1 A SQL interface (in addition to DrQL interface)
>>>>>>
>>>>>
>>>>> With your help, this may arrive before DrQL is integrated.
>>>>>
>>>>>
>>>>>> 2 JDBC driver
>>>>>>
>>>>>
>>>>> Should be pretty straightforward.  Not on anybody's task list just
>> yet, I
>>>>> don't think.
>>>>>
>>>>>
>>>>>> 3 Access to the stack at a lower level (i.e. a way to use the
>>>>>> high-performance scan operators without writing a query)
>>>>>>
>>>>>
>>>>> Definitely going to happen.
>>>>>
>>>>>
>>>>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
>>>> of
>>>>>> primitives or nio buffers)
>>>>>>
>>>>>
>>>>> I wonder if this is just a matter of writing a special scanner or a
>>>> special
>>>>> flavor of join at the execution point.  The scanner for the case where
>>>> the
>>>>> in-memory compact form is only readable in sequential form. The
>>>>> join-operator if the memory can be accessed at random.
>>>>>
>>>>> ...
>>>>>> I know some of these are outside of Drill's scope. If so, feel free to
>>>>>> disregard. But if you don't ask, you don't get. :)
>>>>>>
>>>>>
>>>>> They all look pretty reasonable to me.
>>>>>
>>>>
>>
>>