Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive JOINs not working as expected (returns 0 rows)


Copy link to this message
-
RE: Hive JOINs not working as expected (returns 0 rows)
I did get this to work on Cloudera (CDH4).

Also got both self-joins to work using MapR (both return 1 as expected).
   select count(*) from ext_date_1 a join ext_date_1 b on (a.cal_dt = b.cal_dt) ;  
select count(*) from ext_order_1 a join ext_order_1 b on (a.cal_dt = b.cal_dt) ;

I guess something must be wrong is MapR config somewhere?

Thanks to those who replied to this post.
-----Original Message-----
From: Edward Capriolo [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, November 20, 2012 11:34 AM
To: [EMAIL PROTECTED]
Subject: Re: Hive JOINs not working as expected (returns 0 rows)

Based on your counters of BYTES_READ its does not look like any data was found/processed for either table.

On Tue, Nov 20, 2012 at 11:22 AM, Dean Wampler <[EMAIL PROTECTED]> wrote:
> Did you install v0.9.0 on your MapR cluster? I suspect there's a bug
> in the interaction between the two. I'm not sure which version of Hive
> is the latest shipped with MapR.
>
> I just tried your example in an older MapR M5 virtual machine instance
> that came with Hive v0.7.1. It worked correctly. I then repeated the
> experiment in Hive v0.9.0 on a generic Apache Hadoop setup I have in
> another VM. It also worked correctly.  (I don't have Hive v0.9.0
> available in the MapR VM.)
>
> Hope this helps.
>
> dean
>
> On Tue, Nov 20, 2012 at 9:43 AM, John Morrison
> <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>
>>
>> New to hive in last few weeks and could use some help with JOINs.
>>
>>
>>
>> Using MapR (version 0.9.0)    /usr/bin/hive ->
>> /opt/mapr/hive/hive-0.9.0/bin/hive
>>
>>
>>
>> I have 2 tables I am wanting to join by date (order_t and date_t).  
>> DDL at bottom.
>>
>>
>>
>> I have reduced this to 1 column and 1 row and still can't get things
>> to work.
>>
>>
>>
>> Any help will be appreciated.
>>
>>
>>
>> -John
>>
>>
>>
>> Details:
>>
>>
>>
>> # data in each table
>>
>> hive> select * from ext_order_1 ;
>>
>> OK
>>
>> 20081203
>>
>> Time taken: 0.076 seconds
>>
>> hive> select * from ext_date_1 ;
>>
>> OK
>>
>> 20081203
>>
>> Time taken: 0.068 seconds
>>
>>
>>
>> # Trying to join results in 0 rows (see yellow below)
>>
>> #
>>
>> hive> select count(*) from ext_order_1 a join ext_date_1 b on
>> hive> (a.cal_dt >> b.cal_dt) ;
>>
>> Total MapReduce jobs = 2
>>
>> Launching Job 1 out of 2
>>
>> Number of reduce tasks not specified. Estimated from input data size:
>> 1
>>
>> In order to change the average load for a reducer (in bytes):
>>
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>>
>> In order to limit the maximum number of reducers:
>>
>>   set hive.exec.reducers.max=<number>
>>
>> In order to set a constant number of reducers:
>>
>>   set mapred.reduce.tasks=<number>
>>
>> Starting Job = job_201211050921_0232, Tracking URL =  xxxx
>>
>> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
>> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232
>>
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 1
>>
>> 2012-11-20 10:08:48,797 Stage-1 map = 0%,  reduce = 0%
>>
>> 2012-11-20 10:08:53,823 Stage-1 map = 100%,  reduce = 0%, Cumulative
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:54,829 Stage-1 map = 100%,  reduce = 0%, Cumulative
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:55,835 Stage-1 map = 100%,  reduce = 0%, Cumulative
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:56,842 Stage-1 map = 100%,  reduce = 0%, Cumulative
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:57,848 Stage-1 map = 100%,  reduce = 0%, Cumulative
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:58,854 Stage-1 map = 100%,  reduce = 0%, Cumulative
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:59,860 Stage-1 map = 100%,  reduce = 0%, Cumulative
>> CPU
>> 1.25 sec
>>
>> 2012-11-20 10:09:00,866 Stage-1 map = 100%,  reduce = 100%,
>> Cumulative CPU
>> 3.23 sec
>>
>> MapReduce Total cumulative CPU time: 3 seconds 230 msec
>>
>> Ended Job = job_201211050921_0232
>>
>> Launching Job 2 out of 2
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB