Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - GROUP BY Issue


Copy link to this message
-
Re: GROUP BY Issue
Shahab Yunus 2013-06-10, 16:28
Gourav, this is not a HIVE mailing list. It is PIG's.

Regards,
Shahab
On Mon, Jun 10, 2013 at 10:39 AM, Gourav Sengupta
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> On running the following query I am getting multiple records with same
> value of F1
>
> SELECT F1, COUNT(*)
> FROM
> (
> SELECT F1, F2, COUNT(*)
> FROM TABLE1
> GROUP BY F1, F2
> ) a
> GROUP BY F1;
>
> As per what I understand there are multiple number of records based on
> number of reducers.
>
> Replicating the test scenario:
> STEP1: get the dataset as available in
> http://snap.stanford.edu/data/amazon0302.html
>
> STEP2: Open the file and delete the heading
>
> STEP3: hadoop fs -mkdir /test
>
> STEP4: hadoop fs -put amazon0302.txt /test
>
> STEP5: create external table test (f1 int, f2 int) row format delimited
> fields terminated by '\t' lines terminated by '\n' stored as textfile
> location '/test';
>
> STEP6: create table test1 location '/test1' as select left_table.* from
> (select * from test where f1<10000) left_table join (select * from test
> where f1 < 10000) right_table;
>
> STEP7: hadoop fs -mkdir /test2
>
> STEP8: create table test2 location '/test2' as select f1, count(*) from
> (select f1, f2, count(*) from test1 group by f1, f2) a group by f1;
>
> STEP9: select * from test2 where f1 = 9887;
>
> ENVIRONMENT:
> HADOOP 2.0.4
> HIVE 0.11
>
> Please do let me know whether I am doing anything wrong.
>
>
> Thanks and Regards,
> Gourav Sengupta
>