Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> how may map-reduce needed in a hive query


+
Richard 2013-01-23, 03:45
+
Richard 2013-01-23, 05:54
Copy link to this message
-
Re: how may map-reduce needed in a hive query
if you look closely in first phase it executes your transform and in second
it does your sum operation
On Wed, Jan 23, 2013 at 11:24 AM, Richard <[EMAIL PROTECTED]> wrote:

> thanks. I used explain command and get the plan, but I am still confused.
> The below is the description of two map-reduce stages:
>
> it seems that in stage-1 the aggregation has already been done, why
> stage-2 has aggregation again?
>
>
> =========================> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a:t1
>           TableScan
>             alias: t1
>             Select Operator
>               expressions:
>          &nbs p;          expr: f
>                     type: string
>               outputColumnNames: _col0
>               Transform Operator
>                 command: mymapper
>                 output info:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                  Select Operator
>                   expressions:
>                         expr: _col0
>                         type: string
>                         expr: _col1
>                         type: string
>                         expr: _col2
> &n bsp;                       type: string
>                   outputColumnNames: _col0, _col1, _col2
>                   Group By Operator
>                     aggregations:
>                           expr: sum(_col0)
>                           expr: sum(_col1)
>                   ;   bucketGroup: false
>                     keys:
>                           expr: _col2
>                           type: string
>                     mode: hash
>                     outputColumnNames: _col0, _col1, _col2
>                     Reduce Output Operator
>               ;         key expressions:
>                             expr: _col0
>                             type: string
>                       sort order: +
>                       Map-reduce partition columns:
>                             expr: rand()
>               &nb sp;             type: double
>                       tag: -1
>                       value expressions:
>                             expr: _col1
>                             type: double
>                             expr: _col2
>                ;              type: double
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations:
>                 expr: sum(VALUE._col0)
>                 expr: sum(VALUE._col1)
>           bucketGroup: false
>           keys:
>                 expr: KEY._col0
>                 type: string
>           mode: partials
>        &n bsp;  outputColumnNames: _col0, _col1, _col2
>           File Output Operator
>             compressed: false
>             GlobalTableId: 0
>             table:
>                 input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                 output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>
> hdfs://hdpnn:9000/mydata/hive/hive_2013-01-23_13-46-09_628_5487089660360786955/10002
>     & nbsp;       Reduce Output Operator
>               key expressions:
>                     expr: _col0
>                     type: string
>               sort order: +
>               Map-reduce partition columns:
>                     expr: _col0
>                     type: string
>               tag: -1
>  &nbs p;            value expressions:
>                     expr: _col1
>                     type: double
>                     expr: _col2
>                     type: double
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations:
>                 expr: sum(VALUE._col0)
>             &nb sp;   expr: sum(VALUE._col1)
>           bucketGroup: false
>           keys:
>                 expr: KEY._col0
>                 type: string
Nitin Pawar
+
Nitin Pawar 2013-01-23, 04:07
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB