Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> Basic queries regarding Apache Drill working


Copy link to this message
-
Re: Basic queries regarding Apache Drill working
Agreed.  Trusting statistics is always a little scary.  My gut is that
whatever the Drill default is, admins will set the approximate flag on by
default and analysts won't even realize it most of the time... They'll just
get faster answers and be happy.

On Fri, Apr 5, 2013 at 8:53 AM, Andrew Brust <
[EMAIL PROTECTED]> wrote:

>  OK, thank you for that explanation.  The whole notion of “not exactly
> right” scares me a bit, but I do see the utility in the approach and the
> point that over a large enough dataset, the statistical accuracy can still
> be there.  Also agreed that a one-pass process beats a two-pass with
> intermediate persistence.****
>
> ** **
>
> *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] *On
> Behalf Of *Jacques Nadeau
> *Sent:* Friday, April 5, 2013 11:34 AM
> *To:* [EMAIL PROTECTED]; devansh kumar
> *Cc:* Andrew Brust; [EMAIL PROTECTED]
>
> *Subject:* Re: Basic queries regarding Apache Drill working****
>
> ** **
>
> The current thinking is that there will be an approximate query flag.
>  This will be useful in situations where parallel approximations can be
> made.  The simplest example is you want a top 10 group by attr1.  You can
> do a local top N group by attr1 and then merge those results.  While not
> exactly right, it can be statistically accurate based on the right choice
> of N.  There is also parallel approximations for other things such as
> median using streaming algorithms.  The goal is for Drill to be able to use
> these approximation algorithms in a processing tree for more queries.  In
> the case that a user needs exact results, full shuffle/aggregations will
> still need to be done.  They will still benefit from avoiding the various
> MapReduce barriers and requirements for persistence between stages.****
>
> ** **
>
> J****
>
> On Thu, Apr 4, 2013 at 10:31 PM, devansh kumar <[EMAIL PROTECTED]>
> wrote:****
>
> Hi,
>
> I understood what you wanted to say of using SUM and COUNT for calculating
> AVERAGE.
> But as i understand this will work very well with Distributed
> operations..... what about operations like Median.
>
> Also i wanted to ask how the query will be broken up in
> the execution engine.
> I have gone through the Apache drill documentation and also Google Dremel
> paper, and i am still confused that how multiple level of aggregation
> will be created inside one tree.
>
> Thanks!****
>
>
>
>
> ________________________________
>  From: devansh kumar <[EMAIL PROTECTED]>
> To: Andrew Brust <[EMAIL PROTECTED]>; "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Friday, April 5, 2013 10:18 AM****
>
> Subject: Re: Basic queries regarding Apache Drill working
>
>
> Hi,
>
> As Andrew asked, how will average work without an operation of Reduce
> present.
> Can you explain more on how will the data be aggregated?
>
>
>
>
> ________________________________
>  From: Andrew Brust <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>;
> devansh kumar <[EMAIL PROTECTED]>
> Sent: Thursday, April 4, 2013 8:00 PM
> Subject: RE: Basic queries regarding Apache Drill working
>
> Still not sure I follow (and pardon what must be a very rudimentary
> misunderstanding on my part) how you get an average across a data set if
> the data is split across nodes.  With MapReduce, the reducer can get it
> because all data for a given key is kept to one node.  How would this work
> with Drill?
>
> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, April 4, 2013 9:27 AM
> To: [EMAIL PROTECTED]; devansh kumar
> Subject: Re: Basic queries regarding Apache Drill working
>
> On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I am new and am
>  trying to understand how Apache Drill  works but i
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB