Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Array index support non-constant expresssion


Copy link to this message
-
Re: Array index support non-constant expresssion
Could you try it with CP/PPD disabled?

set hive.optimize.cp=false;
set hive.optimize.ppd=false;

2012/12/13 java8964 java8964 <[EMAIL PROTECTED]>:
> Hi,
>
> I played my query further, and found out it is very puzzle to explain the
> following behaviors:
>
> 1) The following query works:
>
> select c_poi.provider_str, c_poi.name from (select darray(search_results,
> c.rank) as c_poi from nulf_search lateral view explode(search_clicks)
> clickTable as c) a
>
> I get get all the result from the above query without any problem.
>
> 2) The following query NOT works:
>
> select c_poi.provider_str, c_poi.name from (select darray(search_results,
> c.rank) as c_poi from nulf_search lateral view explode(search_clicks)
> clickTable as c) a where c_poi.provider_str = 'POI'
>
> As long as I add the where criteria on provider_str, or even I added another
> level of sub query like following:
>
> select
> ps, name
> from
> (select c_poi.provider_str as ps, c_poi.name as name from (select
> darray(search_results, c.rank) as c_poi from nulf_search lateral view
> explode(search_clicks) clickTable as c) a ) b
> where ps = 'POI'
>
> any kind of criteria I tried to add on provider_str, the hive MR jobs failed
> in the same error I shown below.
>
> Any idea why this happened? Is it related to the data? But provider_str is
> just a simple String type.
>
> Thanks
>
> Yong
>
> ________________________________
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: RE: Array index support non-constant expresssion
> Date: Wed, 12 Dec 2012 12:15:27 -0500
>
>
> OK.
>
> I followed the hive source code of
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the
> UDF. It is quite simple.
>
> It works fine as I expected for simple case, but when I try to run it under
> some complex query, the hive MR jobs failed with some strange errors. What I
> mean is that it failed in HIVE code base, from stuck trace, I can not see
> this failure has anything to do with my custom code.
>
> I would like some help if some one can tell me what went wrong.
>
> For example, I created this UDF called darray, stand for dynamic array,
> which supports the non-constant value as the index location of the array.
>
> The following query works fine as I expected:
>
> hive> select c_poi.provider_str as provider_str, c_poi.name as name from
> (select darray(search_results, c.index_loc) as c_poi from search_table
> lateral view explode(search_clicks) clickTable as c) a limit 5;
> POI                         xxxx
> ADDRESS               some address
> POI                        xxxx
> POI                        xxxx
> ADDRESSS             some address
>
> Of course, in this case, I only want the provider_str = 'POI' returned, and
> filter out any rows with provider_str != 'POI', so it sounds simple, I
> changed the query to the following:
>
> hive> select c_poi.provider_str as provider_str, c_poi.name as name from
> (select darray(search_results, c.rank) as c_poi from search_table lateral
> view explode(search_clicks) clickTable as c) a where c_poi.provider_str > 'POI' limit 5;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Cannot run job locally: Input Size (= 178314025) is larger than
> hive.exec.mode.local.auto.inputbytes.max (= 134217728)
> Starting Job = job_201212031001_0100, Tracking URL > http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100
> Kill Command = /home/yzhang/hadoop/bin/hadoop job
> -Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_0100
> 2012-12-12 11:45:24,090 Stage-1 map = 0%,  reduce = 0%
> 2012-12-12 11:45:43,173 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201212031001_0100 with errors
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> I am only add a Where limitation, but to my surprise, the MR jobs generated
> by HIVE failed. I am testing this in my local standalone cluster, which is
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB