Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> GROUP BY Issue


+
Gourav Sengupta 2013-06-10, 14:39
Copy link to this message
-
Re: GROUP BY Issue
Gourav, this is not a HIVE mailing list. It is PIG's.

Regards,
Shahab
On Mon, Jun 10, 2013 at 10:39 AM, Gourav Sengupta
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> On running the following query I am getting multiple records with same
> value of F1
>
> SELECT F1, COUNT(*)
> FROM
> (
> SELECT F1, F2, COUNT(*)
> FROM TABLE1
> GROUP BY F1, F2
> ) a
> GROUP BY F1;
>
> As per what I understand there are multiple number of records based on
> number of reducers.
>
> Replicating the test scenario:
> STEP1: get the dataset as available in
> http://snap.stanford.edu/data/amazon0302.html
>
> STEP2: Open the file and delete the heading
>
> STEP3: hadoop fs -mkdir /test
>
> STEP4: hadoop fs -put amazon0302.txt /test
>
> STEP5: create external table test (f1 int, f2 int) row format delimited
> fields terminated by '\t' lines terminated by '\n' stored as textfile
> location '/test';
>
> STEP6: create table test1 location '/test1' as select left_table.* from
> (select * from test where f1<10000) left_table join (select * from test
> where f1 < 10000) right_table;
>
> STEP7: hadoop fs -mkdir /test2
>
> STEP8: create table test2 location '/test2' as select f1, count(*) from
> (select f1, f2, count(*) from test1 group by f1, f2) a group by f1;
>
> STEP9: select * from test2 where f1 = 9887;
>
> ENVIRONMENT:
> HADOOP 2.0.4
> HIVE 0.11
>
> Please do let me know whether I am doing anything wrong.
>
>
> Thanks and Regards,
> Gourav Sengupta
>
+
Gourav Sengupta 2013-06-10, 19:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB