Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive JOINs not working as expected (returns 0 rows)


+
John Morrison 2012-11-20, 15:43
+
Dean Wampler 2012-11-20, 16:22
Copy link to this message
-
Re: Hive JOINs not working as expected (returns 0 rows)
Based on your counters of BYTES_READ its does not look like any data
was found/processed for either table.

On Tue, Nov 20, 2012 at 11:22 AM, Dean Wampler
<[EMAIL PROTECTED]> wrote:
> Did you install v0.9.0 on your MapR cluster? I suspect there's a bug in the
> interaction between the two. I'm not sure which version of Hive is the
> latest shipped with MapR.
>
> I just tried your example in an older MapR M5 virtual machine instance that
> came with Hive v0.7.1. It worked correctly. I then repeated the experiment
> in Hive v0.9.0 on a generic Apache Hadoop setup I have in another VM. It
> also worked correctly.  (I don't have Hive v0.9.0 available in the MapR VM.)
>
> Hope this helps.
>
> dean
>
> On Tue, Nov 20, 2012 at 9:43 AM, John Morrison
> <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>
>>
>> New to hive in last few weeks and could use some help with JOINs.
>>
>>
>>
>> Using MapR (version 0.9.0)    /usr/bin/hive ->
>> /opt/mapr/hive/hive-0.9.0/bin/hive
>>
>>
>>
>> I have 2 tables I am wanting to join by date (order_t and date_t).  DDL at
>> bottom.
>>
>>
>>
>> I have reduced this to 1 column and 1 row and still can’t get things to
>> work.
>>
>>
>>
>> Any help will be appreciated.
>>
>>
>>
>> -John
>>
>>
>>
>> Details:
>>
>>
>>
>> # data in each table
>>
>> hive> select * from ext_order_1 ;
>>
>> OK
>>
>> 20081203
>>
>> Time taken: 0.076 seconds
>>
>> hive> select * from ext_date_1 ;
>>
>> OK
>>
>> 20081203
>>
>> Time taken: 0.068 seconds
>>
>>
>>
>> # Trying to join results in 0 rows (see yellow below)
>>
>> #
>>
>> hive> select count(*) from ext_order_1 a join ext_date_1 b on (a.cal_dt >> b.cal_dt) ;
>>
>> Total MapReduce jobs = 2
>>
>> Launching Job 1 out of 2
>>
>> Number of reduce tasks not specified. Estimated from input data size: 1
>>
>> In order to change the average load for a reducer (in bytes):
>>
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>>
>> In order to limit the maximum number of reducers:
>>
>>   set hive.exec.reducers.max=<number>
>>
>> In order to set a constant number of reducers:
>>
>>   set mapred.reduce.tasks=<number>
>>
>> Starting Job = job_201211050921_0232, Tracking URL =  xxxx
>>
>> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
>> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232
>>
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 1
>>
>> 2012-11-20 10:08:48,797 Stage-1 map = 0%,  reduce = 0%
>>
>> 2012-11-20 10:08:53,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:54,829 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:55,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:56,842 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:57,848 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:58,854 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 1.25 sec
>>
>> 2012-11-20 10:08:59,860 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
>> 1.25 sec
>>
>> 2012-11-20 10:09:00,866 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
>> 3.23 sec
>>
>> MapReduce Total cumulative CPU time: 3 seconds 230 msec
>>
>> Ended Job = job_201211050921_0232
>>
>> Launching Job 2 out of 2
>>
>> Number of reduce tasks determined at compile time: 1
>>
>> In order to change the average load for a reducer (in bytes):
>>
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>>
>> In order to limit the maximum number of reducers:
>>
>>   set hive.exec.reducers.max=<number>
>>
>> In order to set a constant number of reducers:
>>
>>   set mapred.reduce.tasks=<number>
>>
>> Starting Job = job_201211050921_0233, Tracking URL =  xxxx
>>
>> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
>> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0233
>>
>> Hadoop job information for Stage-2: number of mappers: 1; number of
>> reducers: 1
+
John Morrison 2012-11-20, 18:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB