-Re: GROUP BY Issue
Navis류승우 2013-06-14, 02:27
Your table has five "F1 = 9887" rows and joining will make 25 rows
with the same F1 value.
I cannot imagine what you're intended to do.
2013/6/12 Gourav Sengupta <[EMAIL PROTECTED]>:
> I had initially forwarded this request to the user group but am yet to
> receive any response.
> I will be grateful if someone can help me out in resolving the issue or
> pointing out any mistakes that I may be doing.
> It took me around 5 to 6 hours to generate the test data of around 20 GB
> (or more) and there must be a better alternative.
> ---------- Forwarded message ----------
> From: Gourav Sengupta <[EMAIL PROTECTED]>
> Date: Mon, Jun 10, 2013 at 4:10 PM
> Subject: GROUP BY Issue
> To: [EMAIL PROTECTED]
> On running the following query I am getting multiple records with same
> value of F1
> SELECT F1, COUNT(*)
> SELECT F1, F2, COUNT(*)
> FROM TABLE1
> GROUP BY F1, F2
> ) a
> GROUP BY F1;
> As per what I understand there are multiple number of records based on
> number of reducers.
> Replicating the test scenario:
> STEP1: get the dataset as available in
> STEP2: Open the file and delete the heading
> STEP3: hadoop fs -mkdir /test
> STEP4: hadoop fs -put amazon0302.txt /test
> STEP5: create external table test (f1 int, f2 int) row format delimited
> fields terminated by '\t' lines terminated by '\n' stored as textfile
> location '/test';
> STEP6: create table test1 location '/test1' as select left_table.* from
> (select * from test where f1<10000) left_table join (select * from test
> where f1 < 10000) right_table;
> STEP7: hadoop fs -mkdir /test2
> STEP8: create table test2 location '/test2' as select f1, count(*) from
> (select f1, f2, count(*) from test1 group by f1, f2) a group by f1;
> STEP9: select * from test2 where f1 = 9887;
> HADOOP 2.0.4
> HIVE 0.11
> Please do let me know whether I am doing anything wrong.
> Thanks and Regards,
> Gourav Sengupta