Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Re: GROUP BY Issue


Copy link to this message
-
Re: GROUP BY Issue
Your table has five "F1 = 9887" rows and joining will make 25 rows
with the same F1 value.

I cannot imagine what you're intended to do.

2013/6/12 Gourav Sengupta <[EMAIL PROTECTED]>:
> Hi,
>
> I had initially forwarded this request to the user group but am yet to
> receive any response.
>
> I will be grateful if someone can help me out in resolving the issue or
> pointing out any mistakes that I may be doing.
>
> It took me around 5 to 6 hours to generate the test data of around 20 GB
> (or more) and there must be a better alternative.
>
> Regards,
> Gourav
>
> ---------- Forwarded message ----------
> From: Gourav Sengupta <[EMAIL PROTECTED]>
> Date: Mon, Jun 10, 2013 at 4:10 PM
> Subject: GROUP BY Issue
> To: [EMAIL PROTECTED]
>
>
> Hi,
>
> On running the following query I am getting multiple records with same
> value of F1
>
> SELECT F1, COUNT(*)
> FROM
> (
> SELECT F1, F2, COUNT(*)
> FROM TABLE1
> GROUP BY F1, F2
> ) a
> GROUP BY F1;
>
> As per what I understand there are multiple number of records based on
> number of reducers.
>
> Replicating the test scenario:
> STEP1: get the dataset as available in
> http://snap.stanford.edu/data/amazon0302.html
>
> STEP2: Open the file and delete the heading
>
> STEP3: hadoop fs -mkdir /test
>
> STEP4: hadoop fs -put amazon0302.txt /test
>
> STEP5: create external table test (f1 int, f2 int) row format delimited
> fields terminated by '\t' lines terminated by '\n' stored as textfile
> location '/test';
>
> STEP6: create table test1 location '/test1' as select left_table.* from
> (select * from test where f1<10000) left_table join (select * from test
> where f1 < 10000) right_table;
>
> STEP7: hadoop fs -mkdir /test2
>
> STEP8: create table test2 location '/test2' as select f1, count(*) from
> (select f1, f2, count(*) from test1 group by f1, f2) a group by f1;
>
> STEP9: select * from test2 where f1 = 9887;
>
> ENVIRONMENT:
> HADOOP 2.0.4
> HIVE 0.11
>
> Please do let me know whether I am doing anything wrong.
>
>
> Thanks and Regards,
> Gourav Sengupta