Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Example aggregate queries


Copy link to this message
-
Example aggregate queries
I'm starting to work on GROUP BY support, and I'm trying to figure out the syntax of the "collapsingaggregate" operator based on the spec. I tried to find some example logical plans but I couldn't. Could someone give a couple of examples?

Here's a plan that almost works:

{
  "head" : {
    "type" : "apache_drill_logical_plan",
    "version" : 1,
    "generator" : {
      "type" : "manual",
      "info" : "na"
    }
  },
  "storage" : [ {
    "type" : "queue",
    "name" : "queue"
  }, {
    "type" : "classpath",
    "name" : "donuts-json"
  } ],
  "query" : [ {
    "op" : "scan",
    "@id" : 1,
    "memo" : "initial_scan",
    "storageengine" : "donuts-json",
    "selection" : {
      "path" : "/employees.json",
      "type" : "JSON"
    },
    "ref" : "_MAP"
  },  {
    "op" : "collapsingaggregate",
    "@id" : 2,
    "input" : 1,
    "carryovers": [],
    "aggregations" : [ {
      "ref" : "output.c",
      "expr" : "count"
    } ]
  }, {
    "op" : "store",
    "@id" : 3,
    "memo" : "output sink",
    "input" : 2,
    "target" : {
      "number" : 0
    },
    "partition" : null,
    "storageEngine" : "queue"
  } ]
}

Some specific questions:

1. is it valid for "carryovers" to be omitted or empty (I get NPE if I omit it)?
2. what's the syntax for aggregation expression (I tried "COUNT()", "COUNT(*)", "COUNT")
3. the spec seems to imply that I could write something as rich as "5 + SUM(salary) / COUNT()" for aggregation expression. Is that true?
4. "within" is a segment. Does that mean a holder field, like the "toppings" field in donuts.json?
5. is it valid for aggregations to be omitted or empty?
6. in the plan spec [ https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit ] does "*" mean that a field is optional and dagger mean system-generated? It does't say explicitly.
7. if I want to implement "SELECT deptno, COUNT(*) FROM emp GROUP BY deptno", I presume I will generate a plan that consists of scan-segment-collapsingaggregate?

If the example queries answer those questions, no need to answer them explicitly.

Julian

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB