Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> Re: What do you want out of Apache Drill?


Copy link to this message
-
Re: What do you want out of Apache Drill?

@Jacques: +1 on pretty much all you said. I, personally, will be focusing on those as soon as I'm able to get something running.
@Ted: good to know there is no major sentiment against large joins, the required infrastructure for performant large joins should also allow for performant cogroups

-david

On Mar 13, 2013, at 11:42 AM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:

> I have a feeling that large joins will be dealt with sooner rather than
> later (especially with interest and work from people like you).  If you
> look at large queries, things are dominated by large sorts, large joins and
> large group-by aggregations.  We need to make sure those are performant in
> large clusters before we focus on the prettier things.  Hopefully we can
> leverage Google Compute Engine to ensure this.
>
>
>
> On Wed, Mar 13, 2013 at 7:07 AM, David Alves <[EMAIL PROTECTED]> wrote:
>
>> Hi All
>>
>>        Sorry to revive an old thread…
>>        I was going through the list looking for the current stance on
>> joins and I found Ted's answer.
>>        What is the main point behind not doing large joins on Drill?
>>        Is it just simplicity (as in optimizer, etc.) or is there
>> something else?
>>        I mention this because I'm particularly interested in large self
>> joins (I'd can volunteer to work on them myself, of course).
>>        I'm not against leaving them out of any optimizer goals, if one
>> can explicitly select an identity optimizer that will just follow the
>> logical plan, but they are big requirement for me.
>>        Thoughts?
>>
>> Best
>> David
>>
>> On Dec 6, 2012, at 7:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>
>>> Drill is explicitly designed (at this time) with the option of not doing
>>> large joins.  Triple stores pretty much  assume lots of large joins.
>>>
>>> That said, if you could write some suggested typical queries, it would
>> help
>>> the discussion along.  If you could go so far as to translate to a
>> logical
>>> plan, that would be even cooler.
>>>
>>> On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <[EMAIL PROTECTED]> wrote:
>>>
>>>> I would very much be interested in having a SPARQL interface, though I
>> am
>>>> not sure how well Drill will handle many joins.
>>>>
>>>>
>>>> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <[EMAIL PROTECTED]>
>> wrote:
>>>>
>>>>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>>> ...
>>>>>> 1 A SQL interface (in addition to DrQL interface)
>>>>>>
>>>>>
>>>>> With your help, this may arrive before DrQL is integrated.
>>>>>
>>>>>
>>>>>> 2 JDBC driver
>>>>>>
>>>>>
>>>>> Should be pretty straightforward.  Not on anybody's task list just
>> yet, I
>>>>> don't think.
>>>>>
>>>>>
>>>>>> 3 Access to the stack at a lower level (i.e. a way to use the
>>>>>> high-performance scan operators without writing a query)
>>>>>>
>>>>>
>>>>> Definitely going to happen.
>>>>>
>>>>>
>>>>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
>>>> of
>>>>>> primitives or nio buffers)
>>>>>>
>>>>>
>>>>> I wonder if this is just a matter of writing a special scanner or a
>>>> special
>>>>> flavor of join at the execution point.  The scanner for the case where
>>>> the
>>>>> in-memory compact form is only readable in sequential form. The
>>>>> join-operator if the memory can be accessed at random.
>>>>>
>>>>> ...
>>>>>> I know some of these are outside of Drill's scope. If so, feel free to
>>>>>> disregard. But if you don't ask, you don't get. :)
>>>>>>
>>>>>
>>>>> They all look pretty reasonable to me.
>>>>>
>>>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB