Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> hive 0.11 auto convert join bug report


Copy link to this message
-
Re: 回复: hive 0.11 auto convert join bug report
Hi, sorry for late reply.

As Chun Chen said, same hashcode would make this problem vivid. But it can
be happened whenever the appearing order in JOIN expression is different
with that of parents.

Thanks.

2013/9/13 Amit Sharma <[EMAIL PROTECTED]>

> Hi Navis,
>
> I was trying to look at this email thread as well as the jira to
> understand the scope of this issue. Does this get triggered only in cases
> of using aliases which end up mapping to the same value upon hashing? Or
> can this be triggered under other conditions as well? What if the aliases
> are not used and the table names some how might map to similar hashcode
> values?
>
> Also is changing the alias the only workaround for this problem or is
> there any other workaround possible?
>
> Thanks,
> Amit
>
>
> On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Hive is notorious making different result with different aliases.
>> Changing alias was a final way to avoid bug in desperate situation.
>>
>> I think the patch in the issue is ready, wish it's helpful.
>>
>> Thanks.
>>
>> 2013/8/11  <[EMAIL PROTECTED]>:
>> > Hi Navis,
>> >
>> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
>> are
>> > the same and the code in MapJoinProcessor.java ignores the order of
>> > rowschema.
>> > I look at your patch and it's exactly the same place we are working on.
>> > Thanks for your patch.
>> >
>> > 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
>> >
>> > Hi,
>> >
>> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
>> > and attached patch for it.
>> >
>> > It needs full test for confirmation but you can try it.
>> >
>> > Thanks.
>> >
>> > 2013/8/11 <[EMAIL PROTECTED]>:
>> >
>> > Hi all:
>> > when I change the table alias dim_pay_date to A, the query pass in hive
>> > 0.11(
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
>> ):
>> >
>> > use test;
>> > create table if not exists src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `A`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` >> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` >> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > It's quite strange and interesting now. I will keep searching for the
>> answer
>> > to this issue.
>> >
>> >
>> >
>> > 在 2013年8月9日星期五,上午3:32,[EMAIL PROTECTED] 写道:
>> >
>> > Hi all:
>> > I'm currently testing hive11 and encounter one bug with
>> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
>> > it(or you can reach the testcase
>> > here:
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>> >
>> > use test;
>> > create table src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;