Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Re: GROUP BY Issue


Copy link to this message
-
Re: GROUP BY Issue
Your table has five "F1 = 9887" rows and joining will make 25 rows
with the same F1 value.

I cannot imagine what you're intended to do.

2013/6/12 Gourav Sengupta <[EMAIL PROTECTED]>:
> Hi,
>
> I had initially forwarded this request to the user group but am yet to
> receive any response.
>
> I will be grateful if someone can help me out in resolving the issue or
> pointing out any mistakes that I may be doing.
>
> It took me around 5 to 6 hours to generate the test data of around 20 GB
> (or more) and there must be a better alternative.
>
> Regards,
> Gourav
>
> ---------- Forwarded message ----------
> From: Gourav Sengupta <[EMAIL PROTECTED]>
> Date: Mon, Jun 10, 2013 at 4:10 PM
> Subject: GROUP BY Issue
> To: [EMAIL PROTECTED]
>
>
> Hi,
>
> On running the following query I am getting multiple records with same
> value of F1
>
> SELECT F1, COUNT(*)
> FROM
> (
> SELECT F1, F2, COUNT(*)
> FROM TABLE1
> GROUP BY F1, F2
> ) a
> GROUP BY F1;
>
> As per what I understand there are multiple number of records based on
> number of reducers.
>
> Replicating the test scenario:
> STEP1: get the dataset as available in
> http://snap.stanford.edu/data/amazon0302.html
>
> STEP2: Open the file and delete the heading
>
> STEP3: hadoop fs -mkdir /test
>
> STEP4: hadoop fs -put amazon0302.txt /test
>
> STEP5: create external table test (f1 int, f2 int) row format delimited
> fields terminated by '\t' lines terminated by '\n' stored as textfile
> location '/test';
>
> STEP6: create table test1 location '/test1' as select left_table.* from
> (select * from test where f1<10000) left_table join (select * from test
> where f1 < 10000) right_table;
>
> STEP7: hadoop fs -mkdir /test2
>
> STEP8: create table test2 location '/test2' as select f1, count(*) from
> (select f1, f2, count(*) from test1 group by f1, f2) a group by f1;
>
> STEP9: select * from test2 where f1 = 9887;
>
> ENVIRONMENT:
> HADOOP 2.0.4
> HIVE 0.11
>
> Please do let me know whether I am doing anything wrong.
>
>
> Thanks and Regards,
> Gourav Sengupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB