|
|
-
Hive JOINs not working as expected (returns 0 rows)
John Morrison 2012-11-20, 15:43
Hi,
New to hive in last few weeks and could use some help with JOINs.
Using MapR (version 0.9.0) /usr/bin/hive -> /opt/mapr/hive/hive-0.9.0/bin/hive
I have 2 tables I am wanting to join by date (order_t and date_t). DDL at bottom.
I have reduced this to 1 column and 1 row and still can't get things to work.
Any help will be appreciated.
-John
Details:
# data in each table hive> select * from ext_order_1 ; OK 20081203 Time taken: 0.076 seconds hive> select * from ext_date_1 ; OK 20081203 Time taken: 0.068 seconds
# Trying to join results in 0 rows (see yellow below) # hive> select count(*) from ext_order_1 a join ext_date_1 b on (a.cal_dt = b.cal_dt) ; Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201211050921_0232, Tracking URL = xxxx Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2012-11-20 10:08:48,797 Stage-1 map = 0%, reduce = 0% 2012-11-20 10:08:53,823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2012-11-20 10:08:54,829 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2012-11-20 10:08:55,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2012-11-20 10:08:56,842 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2012-11-20 10:08:57,848 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2012-11-20 10:08:58,854 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2012-11-20 10:08:59,860 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2012-11-20 10:09:00,866 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.23 sec MapReduce Total cumulative CPU time: 3 seconds 230 msec Ended Job = job_201211050921_0232 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201211050921_0233, Tracking URL = xxxx Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0233 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2012-11-20 10:09:02,058 Stage-2 map = 0%, reduce = 0% 2012-11-20 10:09:07,084 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.89 sec 2012-11-20 10:09:08,091 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.89 sec 2012-11-20 10:09:09,101 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.89 sec 2012-11-20 10:09:10,106 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.89 sec 2012-11-20 10:09:11,112 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.89 sec 2012-11-20 10:09:12,119 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.89 sec 2012-11-20 10:09:13,125 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 1.5 sec MapReduce Total cumulative CPU time: 1 seconds 500 msec Ended Job = job_201211050921_0233 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.23 sec MAPRFS Read: 396 MAPRFS Write: 222 SUCCESS Job 1: Map: 1 Reduce: 1 Cumulative CPU: 1.5 sec MAPRFS Read: 628 MAPRFS Write: 72 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 730 msec OK 0 Time taken: 25.727 seconds
# Hive DDL # $> cat date_t_1_row.ddl
CREATE external TABLE ext_date_1 ( cal_dt INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/Work/Hive/tables/date_1' ; $> cat order_1.ddl
CREATE external TABLE ext_order_1 ( cal_dt int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/Work/Hive/tables/order_1' ;
+
John Morrison 2012-11-20, 15:43
-
Re: Hive JOINs not working as expected (returns 0 rows)
Dean Wampler 2012-11-20, 16:22
Did you install v0.9.0 on your MapR cluster? I suspect there's a bug in the interaction between the two. I'm not sure which version of Hive is the latest shipped with MapR.
I just tried your example in an older MapR M5 virtual machine instance that came with Hive v0.7.1. It worked correctly. I then repeated the experiment in Hive v0.9.0 on a generic Apache Hadoop setup I have in another VM. It also worked correctly. (I don't have Hive v0.9.0 available in the MapR VM.)
Hope this helps.
dean
On Tue, Nov 20, 2012 at 9:43 AM, John Morrison < [EMAIL PROTECTED]> wrote:
> Hi,**** > > ** ** > > New to hive in last few weeks and could use some help with JOINs.**** > > ** ** > > Using MapR (version 0.9.0) /usr/bin/hive -> > /opt/mapr/hive/hive-0.9.0/bin/hive**** > > ** ** > > I have 2 tables I am wanting to join by date (order_t and date_t). DDL at > bottom.**** > > ** ** > > I have reduced this to 1 column and 1 row and still can’t get things to > work.**** > > ** ** > > Any help will be appreciated.**** > > ** ** > > -John**** > > ** ** > > Details:**** > > ** ** > > # data in each table**** > > hive> select * from ext_order_1 ;**** > > OK**** > > 20081203**** > > Time taken: 0.076 seconds**** > > hive> select * from ext_date_1 ;**** > > OK**** > > 20081203**** > > Time taken: 0.068 seconds**** > > ** ** > > # Trying to join results in 0 rows (see yellow below)**** > > #**** > > hive> select count(*) from ext_order_1 a join ext_date_1 b on (a.cal_dt > b.cal_dt) ;**** > > Total MapReduce jobs = 2**** > > Launching Job 1 out of 2**** > > Number of reduce tasks not specified. Estimated from input data size: 1*** > * > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201211050921_0232, Tracking URL = xxxx**** > > Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job > -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232**** > > Hadoop job information for Stage-1: number of mappers: 1; number of > reducers: 1**** > > 2012-11-20 10:08:48,797 Stage-1 map = 0%, reduce = 0%**** > > 2012-11-20 10:08:53,823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 1.25 sec**** > > 2012-11-20 10:08:54,829 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 1.25 sec**** > > 2012-11-20 10:08:55,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 1.25 sec**** > > 2012-11-20 10:08:56,842 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 1.25 sec**** > > 2012-11-20 10:08:57,848 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 1.25 sec**** > > 2012-11-20 10:08:58,854 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 1.25 sec**** > > 2012-11-20 10:08:59,860 Stage-1 map = 100%, reduce = 0%, Cumulative CPU > 1.25 sec**** > > 2012-11-20 10:09:00,866 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 3.23 sec**** > > MapReduce Total cumulative CPU time: 3 seconds 230 msec**** > > Ended Job = job_201211050921_0232**** > > Launching Job 2 out of 2**** > > Number of reduce tasks determined at compile time: 1**** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201211050921_0233, Tracking URL = xxxx**** > > Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job > -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0233**** > > Hadoop job information for Stage-2: number of mappers: 1; number of > reducers: 1**** > > 2012-11-20 10:09:02,058 Stage-2 map = 0%, reduce = 0%**** > > 2012-11-20 10:09 *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
+
Dean Wampler 2012-11-20, 16:22
-
Re: Hive JOINs not working as expected (returns 0 rows)
Edward Capriolo 2012-11-20, 16:33
Based on your counters of BYTES_READ its does not look like any data was found/processed for either table.
On Tue, Nov 20, 2012 at 11:22 AM, Dean Wampler <[EMAIL PROTECTED]> wrote: > Did you install v0.9.0 on your MapR cluster? I suspect there's a bug in the > interaction between the two. I'm not sure which version of Hive is the > latest shipped with MapR. > > I just tried your example in an older MapR M5 virtual machine instance that > came with Hive v0.7.1. It worked correctly. I then repeated the experiment > in Hive v0.9.0 on a generic Apache Hadoop setup I have in another VM. It > also worked correctly. (I don't have Hive v0.9.0 available in the MapR VM.) > > Hope this helps. > > dean > > On Tue, Nov 20, 2012 at 9:43 AM, John Morrison > <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> >> >> New to hive in last few weeks and could use some help with JOINs. >> >> >> >> Using MapR (version 0.9.0) /usr/bin/hive -> >> /opt/mapr/hive/hive-0.9.0/bin/hive >> >> >> >> I have 2 tables I am wanting to join by date (order_t and date_t). DDL at >> bottom. >> >> >> >> I have reduced this to 1 column and 1 row and still can’t get things to >> work. >> >> >> >> Any help will be appreciated. >> >> >> >> -John >> >> >> >> Details: >> >> >> >> # data in each table >> >> hive> select * from ext_order_1 ; >> >> OK >> >> 20081203 >> >> Time taken: 0.076 seconds >> >> hive> select * from ext_date_1 ; >> >> OK >> >> 20081203 >> >> Time taken: 0.068 seconds >> >> >> >> # Trying to join results in 0 rows (see yellow below) >> >> # >> >> hive> select count(*) from ext_order_1 a join ext_date_1 b on (a.cal_dt >> b.cal_dt) ; >> >> Total MapReduce jobs = 2 >> >> Launching Job 1 out of 2 >> >> Number of reduce tasks not specified. Estimated from input data size: 1 >> >> In order to change the average load for a reducer (in bytes): >> >> set hive.exec.reducers.bytes.per.reducer=<number> >> >> In order to limit the maximum number of reducers: >> >> set hive.exec.reducers.max=<number> >> >> In order to set a constant number of reducers: >> >> set mapred.reduce.tasks=<number> >> >> Starting Job = job_201211050921_0232, Tracking URL = xxxx >> >> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job >> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232 >> >> Hadoop job information for Stage-1: number of mappers: 1; number of >> reducers: 1 >> >> 2012-11-20 10:08:48,797 Stage-1 map = 0%, reduce = 0% >> >> 2012-11-20 10:08:53,823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 1.25 sec >> >> 2012-11-20 10:08:54,829 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 1.25 sec >> >> 2012-11-20 10:08:55,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 1.25 sec >> >> 2012-11-20 10:08:56,842 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 1.25 sec >> >> 2012-11-20 10:08:57,848 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 1.25 sec >> >> 2012-11-20 10:08:58,854 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 1.25 sec >> >> 2012-11-20 10:08:59,860 Stage-1 map = 100%, reduce = 0%, Cumulative CPU >> 1.25 sec >> >> 2012-11-20 10:09:00,866 Stage-1 map = 100%, reduce = 100%, Cumulative CPU >> 3.23 sec >> >> MapReduce Total cumulative CPU time: 3 seconds 230 msec >> >> Ended Job = job_201211050921_0232 >> >> Launching Job 2 out of 2 >> >> Number of reduce tasks determined at compile time: 1 >> >> In order to change the average load for a reducer (in bytes): >> >> set hive.exec.reducers.bytes.per.reducer=<number> >> >> In order to limit the maximum number of reducers: >> >> set hive.exec.reducers.max=<number> >> >> In order to set a constant number of reducers: >> >> set mapred.reduce.tasks=<number> >> >> Starting Job = job_201211050921_0233, Tracking URL = xxxx >> >> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job >> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0233 >> >> Hadoop job information for Stage-2: number of mappers: 1; number of >> reducers: 1
+
Edward Capriolo 2012-11-20, 16:33
-
RE: Hive JOINs not working as expected (returns 0 rows)
John Morrison 2012-11-20, 18:14
I did get this to work on Cloudera (CDH4).
Also got both self-joins to work using MapR (both return 1 as expected). select count(*) from ext_date_1 a join ext_date_1 b on (a.cal_dt = b.cal_dt) ; select count(*) from ext_order_1 a join ext_order_1 b on (a.cal_dt = b.cal_dt) ;
I guess something must be wrong is MapR config somewhere?
Thanks to those who replied to this post. -----Original Message----- From: Edward Capriolo [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 20, 2012 11:34 AM To: [EMAIL PROTECTED] Subject: Re: Hive JOINs not working as expected (returns 0 rows)
Based on your counters of BYTES_READ its does not look like any data was found/processed for either table.
On Tue, Nov 20, 2012 at 11:22 AM, Dean Wampler <[EMAIL PROTECTED]> wrote: > Did you install v0.9.0 on your MapR cluster? I suspect there's a bug > in the interaction between the two. I'm not sure which version of Hive > is the latest shipped with MapR. > > I just tried your example in an older MapR M5 virtual machine instance > that came with Hive v0.7.1. It worked correctly. I then repeated the > experiment in Hive v0.9.0 on a generic Apache Hadoop setup I have in > another VM. It also worked correctly. (I don't have Hive v0.9.0 > available in the MapR VM.) > > Hope this helps. > > dean > > On Tue, Nov 20, 2012 at 9:43 AM, John Morrison > <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> >> >> New to hive in last few weeks and could use some help with JOINs. >> >> >> >> Using MapR (version 0.9.0) /usr/bin/hive -> >> /opt/mapr/hive/hive-0.9.0/bin/hive >> >> >> >> I have 2 tables I am wanting to join by date (order_t and date_t). >> DDL at bottom. >> >> >> >> I have reduced this to 1 column and 1 row and still can't get things >> to work. >> >> >> >> Any help will be appreciated. >> >> >> >> -John >> >> >> >> Details: >> >> >> >> # data in each table >> >> hive> select * from ext_order_1 ; >> >> OK >> >> 20081203 >> >> Time taken: 0.076 seconds >> >> hive> select * from ext_date_1 ; >> >> OK >> >> 20081203 >> >> Time taken: 0.068 seconds >> >> >> >> # Trying to join results in 0 rows (see yellow below) >> >> # >> >> hive> select count(*) from ext_order_1 a join ext_date_1 b on >> hive> (a.cal_dt >> b.cal_dt) ; >> >> Total MapReduce jobs = 2 >> >> Launching Job 1 out of 2 >> >> Number of reduce tasks not specified. Estimated from input data size: >> 1 >> >> In order to change the average load for a reducer (in bytes): >> >> set hive.exec.reducers.bytes.per.reducer=<number> >> >> In order to limit the maximum number of reducers: >> >> set hive.exec.reducers.max=<number> >> >> In order to set a constant number of reducers: >> >> set mapred.reduce.tasks=<number> >> >> Starting Job = job_201211050921_0232, Tracking URL = xxxx >> >> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job >> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232 >> >> Hadoop job information for Stage-1: number of mappers: 1; number of >> reducers: 1 >> >> 2012-11-20 10:08:48,797 Stage-1 map = 0%, reduce = 0% >> >> 2012-11-20 10:08:53,823 Stage-1 map = 100%, reduce = 0%, Cumulative >> CPU >> 1.25 sec >> >> 2012-11-20 10:08:54,829 Stage-1 map = 100%, reduce = 0%, Cumulative >> CPU >> 1.25 sec >> >> 2012-11-20 10:08:55,835 Stage-1 map = 100%, reduce = 0%, Cumulative >> CPU >> 1.25 sec >> >> 2012-11-20 10:08:56,842 Stage-1 map = 100%, reduce = 0%, Cumulative >> CPU >> 1.25 sec >> >> 2012-11-20 10:08:57,848 Stage-1 map = 100%, reduce = 0%, Cumulative >> CPU >> 1.25 sec >> >> 2012-11-20 10:08:58,854 Stage-1 map = 100%, reduce = 0%, Cumulative >> CPU >> 1.25 sec >> >> 2012-11-20 10:08:59,860 Stage-1 map = 100%, reduce = 0%, Cumulative >> CPU >> 1.25 sec >> >> 2012-11-20 10:09:00,866 Stage-1 map = 100%, reduce = 100%, >> Cumulative CPU >> 3.23 sec >> >> MapReduce Total cumulative CPU time: 3 seconds 230 msec >> >> Ended Job = job_201211050921_0232 >> >> Launching Job 2 out of 2
+
John Morrison 2012-11-20, 18:14
|
|