Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Purpose of Scan.selection and Operator.ref?


Copy link to this message
-
Purpose of Scan.selection and Operator.ref?
Hello drillers,

I'm still puzzling the purpose of the "selection" attribute of the "Scan" operator and the "ref" attribute of various operators such as "Scan", "Transform", "Group".

I notice that "selection" is not used (which is good, since there is no "activity" attribute in donuts.json).

I understand that "ref" chooses the output expression(s) of each operator, and see those expressions are necessary. But I don't understand why every "ref" in simple_plan.json is prefixed with "donuts".

My understanding is that each operator's input and output is a JSON array. The elements of that array (the "rows" in SQL parlance) are usually JSON objects (i.e. records with named fields) but might sometimes be scalars or arrays.

The output of the "aggregate" operator in simple_plan.json would be something like

[
  {
    "donuts": {
      "sales" : 1099.22,
      "typeCount" : 1,
      "quantity" : 10000,
      "ppu" : 0.11
  },
  {
    "donuts": {
      "sales" : 109.71,
      "typeCount" : 2,
      "quantity" : 159,
      "ppu" : 0.69
    }
  },
  {
    "donuts": {
      "sales" : 184.25,
      "typeCount" : 2,
      "quantity" : 335,
      "ppu" : 0.55
  }
]

The output is a list of objects, each of which has just one field "donuts", whose value is an object. The only purpose of the "donuts" prefix is to increase the nesting level. And other operators do the same thing. It would seem to me more natural to just use one level of nesting:

[
  {
    "sales" : 1099.22,
    "typeCount" : 1,
    "quantity" : 10000,
    "ppu" : 0.11
  },
  ...
]

Of course it's not wrong to do this, but I wanted to ask why someone would choose an extra level of nesting. Or to check whether my understanding was wrong. (I'm pondering how to make a SQL front-end generate something like simple_plan.json and right now I can see no reason why it would generate a ref values with a "donuts." prefix.)

Is the intent of "selection" to remove a level of nesting when reading a source?

Julian