Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Array index support non-constant expresssion


+
java8964 java8964 2012-12-11, 22:24
+
java8964 java8964 2012-12-12, 17:15
Copy link to this message
-
RE: Array index support non-constant expresssion

Hi,
I played my query further, and found out it is very puzzle to explain the following behaviors:
1) The following query works:
select c_poi.provider_str, c_poi.name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a
I get get all the result from the above query without any problem.
2) The following query NOT works:
select c_poi.provider_str, c_poi.name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI'
As long as I add the where criteria on provider_str, or even I added another level of sub query like following:
selectps, namefrom (select c_poi.provider_str as ps, c_poi.name as name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a ) bwhere ps = 'POI'
any kind of criteria I tried to add on provider_str, the hive MR jobs failed in the same error I shown below.
Any idea why this happened? Is it related to the data? But provider_str is just a simple String type.
Thanks
Yong
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: RE: Array index support non-constant expresssion
Date: Wed, 12 Dec 2012 12:15:27 -0500

OK.
I followed the hive source code of org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the UDF. It is quite simple.
It works fine as I expected for simple case, but when I try to run it under some complex query, the hive MR jobs failed with some strange errors. What I mean is that it failed in HIVE code base, from stuck trace, I can not see this failure has anything to do with my custom code.
I would like some help if some one can tell me what went wrong.
For example, I created this UDF called darray, stand for dynamic array, which supports the non-constant value as the index location of the array.
The following query works fine as I expected:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from (select darray(search_results, c.index_loc) as c_poi from search_table lateral view explode(search_clicks) clickTable as c) a limit 5;POI                         xxxxADDRESS               some addressPOI                        xxxxPOI                        xxxxADDRESSS             some address
Of course, in this case, I only want the provider_str = 'POI' returned, and filter out any rows with provider_str != 'POI', so it sounds simple, I changed the query to the following:
hive> select c_poi.provider_str as provider_str, c_poi.name as name from (select darray(search_results, c.rank) as c_poi from search_table lateral view explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI' limit 5;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorCannot run job locally: Input Size (= 178314025) is larger than hive.exec.mode.local.auto.inputbytes.max (= 134217728)Starting Job = job_201212031001_0100, Tracking URL = http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100Kill Command = /home/yzhang/hadoop/bin/hadoop job  -Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_01002012-12-12 11:45:24,090 Stage-1 map = 0%,  reduce = 0%2012-12-12 11:45:43,173 Stage-1 map = 100%,  reduce = 100%Ended Job = job_201212031001_0100 with errorsFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
I am only add a Where limitation, but to my surprise, the MR jobs generated by HIVE failed. I am testing this in my local standalone cluster, which is running CDH3U3 release. When I check the hadoop userlog, here is what I got:
2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct<lat:double,lon:double,query_text_raw:string,query_text_normalized:string,query_string:string,llcountry:string,ipcountry:string,request_cnt:int,address:struct<country:string,state:string,zip:string,city:string,street:string,house:string>,categories_id:array<int>,categories_name:array<string>,lang_raw:string,lang_rose:string,lang:string,viewport:struct<top_lat:double,left_lon:double,bottom_lat:double,right_lon:double>>,_col15:struct<versions:int,physical_host:string,nose_request_id:string,client_type:string,ip:int,time_taken:int,user_agent:string,http_host:string,http_referrer:string,http_status:smallint,http_size:int,accept_language:string,md5:string,datacenter:string,tlv_map_data_version:string,tlv_devide_software_version:string,csid:int,rid:string,xncrid:string,cbfn:string,sources:array<struct<tm:bigint,tm_date:string,tm_time:string,md5:string,time_taken:int>>>,_col16:array<struct<provider_str:string,name:string,lat:double,lon:double,dyn:boolean,authoritative:boolean,search_center:boolean>>,_col17:array<struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>,_col18:string,_col19:struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>2012-12-12 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error running childjava.lang.RuntimeException: Error in configuring object        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:396)    
+
Navis류승우 2012-12-13, 00:06
+
java8964 java8964 2012-12-13, 01:43
+
Navis류승우 2012-12-13, 04:46
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB