|
|
-
Array index support non-constant expresssion
java8964 java8964 2012-12-11, 22:24
Hi, In our project to use the HIVE on CDH3U4 release (Hive 0.7.1), I have a hive table like the following: Table foo ( search_results array<struct<id:bigint,..............>> search_clicks array<struct<index_loc:int, ..............>>) As you can see, the 2nd column, which represents the list of search results clicked, contains the index location of which results was clicked (starting from 0). Now I need to find out the click count of some IDs, but this IDs only exists in the search_result struct. When I tried to write a query for this, I will try to do following first: select search_results[c.index_loc] from foo lateral view explode(search_clicks) clickTable as c; But it won't work in hive, as the following error message returned:AILED: Error in semantic analysis: Line 1:7 Non-constant expressions for array indexes not supported rank. It looks like that currently Hive (at least 0.7) does NOT support non-constant expressions as the array index. I searched on google, and found out the following HIVE jira ticket: https://issues.apache.org/jira/browse/HIVE-1955It looks like someone has the same request to support it, but not yet. But there is a comment in the above ticket that it can be done in an UDF. My question is, can anyone share some ideas about how to archive this in an UDF, as it maybe the only option for me right now? Thanks
-
RE: Array index support non-constant expresssion
java8964 java8964 2012-12-12, 17:15
OK. I followed the hive source code of org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the UDF. It is quite simple. It works fine as I expected for simple case, but when I try to run it under some complex query, the hive MR jobs failed with some strange errors. What I mean is that it failed in HIVE code base, from stuck trace, I can not see this failure has anything to do with my custom code. I would like some help if some one can tell me what went wrong. For example, I created this UDF called darray, stand for dynamic array, which supports the non-constant value as the index location of the array. The following query works fine as I expected: hive> select c_poi.provider_str as provider_str, c_poi.name as name from (select darray(search_results, c.index_loc) as c_poi from search_table lateral view explode(search_clicks) clickTable as c) a limit 5;POI xxxxADDRESS some addressPOI xxxxPOI xxxxADDRESSS some address Of course, in this case, I only want the provider_str = 'POI' returned, and filter out any rows with provider_str != 'POI', so it sounds simple, I changed the query to the following: hive> select c_poi.provider_str as provider_str, c_poi.name as name from (select darray(search_results, c.rank) as c_poi from search_table lateral view explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI' limit 5;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorCannot run job locally: Input Size (= 178314025) is larger than hive.exec.mode.local.auto.inputbytes.max (= 134217728)Starting Job = job_201212031001_0100, Tracking URL = http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100Kill Command = /home/yzhang/hadoop/bin/hadoop job -Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_01002012-12-12 11:45:24,090 Stage-1 map = 0%, reduce = 0%2012-12-12 11:45:43,173 Stage-1 map = 100%, reduce = 100%Ended Job = job_201212031001_0100 with errorsFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask I am only add a Where limitation, but to my surprise, the MR jobs generated by HIVE failed. I am testing this in my local standalone cluster, which is running CDH3U3 release. When I check the hadoop userlog, here is what I got: 2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct<lat:double,lon:double,query_text_raw:string,query_text_normalized:string,query_string:string,llcountry:string,ipcountry:string,request_cnt:int,address:struct<country:string,state:string,zip:string,city:string,street:string,house:string>,categories_id:array<int>,categories_name:array<string>,lang_raw:string,lang_rose:string,lang:string,viewport:struct<top_lat:double,left_lon:double,bottom_lat:double,right_lon:double>>,_col15:struct<versions:int,physical_host:string,nose_request_id:string,client_type:string,ip:int,time_taken:int,user_agent:string,http_host:string,http_referrer:string,http_status:smallint,http_size:int,accept_language:string,md5:string,datacenter:string,tlv_map_data_version:string,tlv_devide_software_version:string,csid:int,rid:string,xncrid:string,cbfn:string,sources:array<struct<tm:bigint,tm_date:string,tm_time:string,md5:string,time_taken:int>>>,_col16:array<struct<provider_str:string,name:string,lat:double,lon:double,dyn:boolean,authoritative:boolean,search_center:boolean>>,_col17:array<struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>,_col18:string,_col19:struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>2012-12-12 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error running childjava.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 moreCaused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessor
-
RE: Array index support non-constant expresssion
java8964 java8964 2012-12-12, 18:28
Hi, I played my query further, and found out it is very puzzle to explain the following behaviors: 1) The following query works: select c_poi.provider_str, c_poi.name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a I get get all the result from the above query without any problem. 2) The following query NOT works: select c_poi.provider_str, c_poi.name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI' As long as I add the where criteria on provider_str, or even I added another level of sub query like following: selectps, namefrom (select c_poi.provider_str as ps, c_poi.name as name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a ) bwhere ps = 'POI' any kind of criteria I tried to add on provider_str, the hive MR jobs failed in the same error I shown below. Any idea why this happened? Is it related to the data? But provider_str is just a simple String type. Thanks Yong From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: RE: Array index support non-constant expresssion Date: Wed, 12 Dec 2012 12:15:27 -0500 OK. I followed the hive source code of org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the UDF. It is quite simple. It works fine as I expected for simple case, but when I try to run it under some complex query, the hive MR jobs failed with some strange errors. What I mean is that it failed in HIVE code base, from stuck trace, I can not see this failure has anything to do with my custom code. I would like some help if some one can tell me what went wrong. For example, I created this UDF called darray, stand for dynamic array, which supports the non-constant value as the index location of the array. The following query works fine as I expected: hive> select c_poi.provider_str as provider_str, c_poi.name as name from (select darray(search_results, c.index_loc) as c_poi from search_table lateral view explode(search_clicks) clickTable as c) a limit 5;POI xxxxADDRESS some addressPOI xxxxPOI xxxxADDRESSS some address Of course, in this case, I only want the provider_str = 'POI' returned, and filter out any rows with provider_str != 'POI', so it sounds simple, I changed the query to the following: hive> select c_poi.provider_str as provider_str, c_poi.name as name from (select darray(search_results, c.rank) as c_poi from search_table lateral view explode(search_clicks) clickTable as c) a where c_poi.provider_str = 'POI' limit 5;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorCannot run job locally: Input Size (= 178314025) is larger than hive.exec.mode.local.auto.inputbytes.max (= 134217728)Starting Job = job_201212031001_0100, Tracking URL = http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100Kill Command = /home/yzhang/hadoop/bin/hadoop job -Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_01002012-12-12 11:45:24,090 Stage-1 map = 0%, reduce = 0%2012-12-12 11:45:43,173 Stage-1 map = 100%, reduce = 100%Ended Job = job_201212031001_0100 with errorsFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask I am only add a Where limitation, but to my surprise, the MR jobs generated by HIVE failed. I am testing this in my local standalone cluster, which is running CDH3U3 release. When I check the hadoop userlog, here is what I got: 2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct<lat:double,lon:double,query_text_raw:string,query_text_normalized:string,query_string:string,llcountry:string,ipcountry:string,request_cnt:int,address:struct<country:string,state:string,zip:string,city:string,street:string,house:string>,categories_id:array<int>,categories_name:array<string>,lang_raw:string,lang_rose:string,lang:string,viewport:struct<top_lat:double,left_lon:double,bottom_lat:double,right_lon:double>>,_col15:struct<versions:int,physical_host:string,nose_request_id:string,client_type:string,ip:int,time_taken:int,user_agent:string,http_host:string,http_referrer:string,http_status:smallint,http_size:int,accept_language:string,md5:string,datacenter:string,tlv_map_data_version:string,tlv_devide_software_version:string,csid:int,rid:string,xncrid:string,cbfn:string,sources:array<struct<tm:bigint,tm_date:string,tm_time:string,md5:string,time_taken:int>>>,_col16:array<struct<provider_str:string,name:string,lat:double,lon:double,dyn:boolean,authoritative:boolean,search_center:boolean>>,_col17:array<struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>,_col18:string,_col19:struct<rank:int,action:int,tm:bigint,event:string,is_csid:boolean,is_rid:boolean,is_pbapi:boolean,is_nac:boolean>>2012-12-12 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error running childjava.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396)
-
Re: Array index support non-constant expresssion
Navis류승우 2012-12-13, 00:06
Could you try it with CP/PPD disabled? set hive.optimize.cp=false; set hive.optimize.ppd=false; 2012/12/13 java8964 java8964 <[EMAIL PROTECTED]>: > Hi, > > I played my query further, and found out it is very puzzle to explain the > following behaviors: > > 1) The following query works: > > select c_poi.provider_str, c_poi.name from (select darray(search_results, > c.rank) as c_poi from nulf_search lateral view explode(search_clicks) > clickTable as c) a > > I get get all the result from the above query without any problem. > > 2) The following query NOT works: > > select c_poi.provider_str, c_poi.name from (select darray(search_results, > c.rank) as c_poi from nulf_search lateral view explode(search_clicks) > clickTable as c) a where c_poi.provider_str = 'POI' > > As long as I add the where criteria on provider_str, or even I added another > level of sub query like following: > > select > ps, name > from > (select c_poi.provider_str as ps, c_poi.name as name from (select > darray(search_results, c.rank) as c_poi from nulf_search lateral view > explode(search_clicks) clickTable as c) a ) b > where ps = 'POI' > > any kind of criteria I tried to add on provider_str, the hive MR jobs failed > in the same error I shown below. > > Any idea why this happened? Is it related to the data? But provider_str is > just a simple String type. > > Thanks > > Yong > > ________________________________ > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: RE: Array index support non-constant expresssion > Date: Wed, 12 Dec 2012 12:15:27 -0500 > > > OK. > > I followed the hive source code of > org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the > UDF. It is quite simple. > > It works fine as I expected for simple case, but when I try to run it under > some complex query, the hive MR jobs failed with some strange errors. What I > mean is that it failed in HIVE code base, from stuck trace, I can not see > this failure has anything to do with my custom code. > > I would like some help if some one can tell me what went wrong. > > For example, I created this UDF called darray, stand for dynamic array, > which supports the non-constant value as the index location of the array. > > The following query works fine as I expected: > > hive> select c_poi.provider_str as provider_str, c_poi.name as name from > (select darray(search_results, c.index_loc) as c_poi from search_table > lateral view explode(search_clicks) clickTable as c) a limit 5; > POI xxxx > ADDRESS some address > POI xxxx > POI xxxx > ADDRESSS some address > > Of course, in this case, I only want the provider_str = 'POI' returned, and > filter out any rows with provider_str != 'POI', so it sounds simple, I > changed the query to the following: > > hive> select c_poi.provider_str as provider_str, c_poi.name as name from > (select darray(search_results, c.rank) as c_poi from search_table lateral > view explode(search_clicks) clickTable as c) a where c_poi.provider_str > 'POI' limit 5; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Cannot run job locally: Input Size (= 178314025) is larger than > hive.exec.mode.local.auto.inputbytes.max (= 134217728) > Starting Job = job_201212031001_0100, Tracking URL > http://blevine-desktop:50030/jobdetails.jsp?jobid=job_201212031001_0100> Kill Command = /home/yzhang/hadoop/bin/hadoop job > -Dmapred.job.tracker=blevine-desktop:8021 -kill job_201212031001_0100 > 2012-12-12 11:45:24,090 Stage-1 map = 0%, reduce = 0% > 2012-12-12 11:45:43,173 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201212031001_0100 with errors > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask > > I am only add a Where limitation, but to my surprise, the MR jobs generated > by HIVE failed. I am testing this in my local standalone cluster, which is
-
RE: Array index support non-constant expresssion
java8964 java8964 2012-12-13, 01:43
Hi, Navis: If I disable both CP/PPD, it will be worse, as neither 1) or 2) query works. But interested thing is that for both queries, I got same error message, but different one comparing with my original error message: 2012-12-12 20:36:21,362 WARN org.apache.hadoop.mapred.Child: Error running childjava.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 moreCaused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 moreCaused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 22 moreCaused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:60) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.initializeOp(LateralViewJoinOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) at org.apache.hadoop.hive.ql.exec.UDTFOperator.initializeOp(UDTFOperator.java:85) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 22 more2012-12-12 20:36:21,365 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
-
Re: Array index support non-constant expresssion
Navis류승우 2012-12-13, 04:46
Different error messages but seemed from same problem.
Could you do that with later versions of hive? I think these kind of bugs are fixed.
2012/12/13 java8964 java8964 <[EMAIL PROTECTED]>: > ExprNodeGenericFuncEvaluator
|
|