|
Shuja Rehman
2010-06-09, 22:07
Ashish Thusoo
2010-06-10, 01:05
Shuja Rehman
2010-06-10, 11:37
Sonal Goyal
2010-06-10, 11:43
Shuja Rehman
2010-06-10, 11:57
Shuja Rehman
2010-06-10, 12:01
Shuja Rehman
2010-06-10, 22:54
Tomasz Domański
2010-06-11, 15:35
Shuja Rehman
2010-06-11, 15:44
|
-
Load data from xml using Mapper.py in hiveShuja Rehman 2010-06-09, 22:07
Hi
I have created a table in hive (Suppose table1 with two columns, col1 and col2 ) now i have an xml file for which i have write a python script which read the xml file and transform it in single row with tab seperated e.g the output of python script can be row 1 = val1 val2 row2 = val3 val4 so the output of file has straight rows with the help of python script. now i want to load this into created table. I have seen the example of in which the data is first loaded in u_data table then transform it using python script in u_data_new but in m scenario. it does not fit as i have xml file as source. Kindly let me know can I achieve this?? Thanks -- Regards Shuja-ur-Rehman Baig _________________________________ MS CS - School of Science and Engineering Lahore University of Management Sciences (LUMS) Sector U, DHA, Lahore, 54792, Pakistan Cell: +92 3214207445
-
RE: Load data from xml using Mapper.py in hiveAshish Thusoo 2010-06-10, 01:05
You could load this whole xml file into a table with a single row and a single column. The default record delimiter is \n but you can create a table where the record delimiter is \001. Once you do that you can follow the approach that you described below. Will this solve your problem?
Ashish ________________________________ From: Shuja Rehman [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 09, 2010 3:07 PM To: [EMAIL PROTECTED] Subject: Load data from xml using Mapper.py in hive Hi I have created a table in hive (Suppose table1 with two columns, col1 and col2 ) now i have an xml file for which i have write a python script which read the xml file and transform it in single row with tab seperated e.g the output of python script can be row 1 = val1 val2 row2 = val3 val4 so the output of file has straight rows with the help of python script. now i want to load this into created table. I have seen the example of in which the data is first loaded in u_data table then transform it using python script in u_data_new but in m scenario. it does not fit as i have xml file as source. Kindly let me know can I achieve this?? Thanks -- Regards Shuja-ur-Rehman Baig _________________________________ MS CS - School of Science and Engineering Lahore University of Management Sciences (LUMS) Sector U, DHA, Lahore, 54792, Pakistan Cell: +92 3214207445
-
Re: Load data from xml using Mapper.py in hiveShuja Rehman 2010-06-10, 11:37
Hi
I have try to do as you described. Let me explain in steps. 1- create table test (xmlFile String); ---------------------------------------------------------------------------------- 2-LOAD DATA LOCAL INPATH '1.xml' OVERWRITE INTO TABLE test; ---------------------------------------------------------------------------------- 3-CREATE TABLE test_new ( b STRING, c STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; ---------------------------------------------------------------------------------- 4-add FILE sampleMapper.groovy; ---------------------------------------------------------------------------------- 5- INSERT OVERWRITE TABLE test_new SELECT TRANSFORM (xmlfile) USING 'sampleMapper.groovy' AS (b,c) FROM test; ---------------------------------------------------------------------------------- *XML FILE*: xml file has only one row for testing purpose which is <xy><a><b>Hello</b><c>world</c></a></xy> ---------------------------------------------------------------------------------- *MAPPER* and i have write the mapper in groovy to parse it. the mapper is def xmlData ="" System.in.withReader { xmlData=xmlData+ it.readLine() } def xy = new XmlParser().parseText(xmlData) def b=xy.a.b.text() def c=xy.a.c.text() println ([b,c].join('\t') ) ---------------------------------------------------------------------------------- Now step 1-4 are fine but when i perform step 5 which will load the data from test table to new table using mapper, it throws the error. The error on console is *FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver* I am facing hard time. Any suggestions Thanks On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <[EMAIL PROTECTED]> wrote: > You could load this whole xml file into a table with a single row and a > single column. The default record delimiter is \n but you can create a table > where the record delimiter is \001. Once you do that you can follow the > approach that you described below. Will this solve your problem? > > Ashish > > ------------------------------ > *From:* Shuja Rehman [mailto:[EMAIL PROTECTED]] > *Sent:* Wednesday, June 09, 2010 3:07 PM > *To:* [EMAIL PROTECTED] > *Subject:* Load data from xml using Mapper.py in hive > > Hi > I have created a table in hive (Suppose table1 with two columns, col1 and > col2 ) > > now i have an xml file for which i have write a python script which read > the xml file and transform it in single row with tab seperated > e.g the output of python script can be > > row 1 = val1 val2 > row2 = val3 val4 > > so the output of file has straight rows with the help of python script. now > i want to load this into created table. I have seen the example of in which > the data is first loaded in u_data table then transform it using python > script in u_data_new but in m scenario. it does not fit as i have xml file > as source. > > > Kindly let me know can I achieve this?? > Thanks > > -- > -- Regards Baig
-
Re: Load data from xml using Mapper.py in hiveSonal Goyal 2010-06-10, 11:43
Can you try changing your logging level to debug and see the exact
error message in hive.log? Thanks and Regards, Sonal www.meghsoft.com http://in.linkedin.com/in/sonalgoyal On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <[EMAIL PROTECTED]> wrote: > Hi > I have try to do as you described. Let me explain in steps. > > 1- create table test (xmlFile String); > ---------------------------------------------------------------------------------- > > 2-LOAD DATA LOCAL INPATH '1.xml' > OVERWRITE INTO TABLE test; > ---------------------------------------------------------------------------------- > > 3-CREATE TABLE test_new ( > b STRING, > c STRING > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t'; > > ---------------------------------------------------------------------------------- > 4-add FILE sampleMapper.groovy; > ---------------------------------------------------------------------------------- > 5- INSERT OVERWRITE TABLE test_new > SELECT > TRANSFORM (xmlfile) > USING 'sampleMapper.groovy' > AS (b,c) > FROM test; > ---------------------------------------------------------------------------------- > XML FILE: > xml file has only one row for testing purpose which is > > <xy><a><b>Hello</b><c>world</c></a></xy> > ---------------------------------------------------------------------------------- > MAPPER > and i have write the mapper in groovy to parse it. the mapper is > > def xmlData ="" > System.in.withReader { > xmlData=xmlData+ it.readLine() > } > > def xy = new XmlParser().parseText(xmlData) > def b=xy.a.b.text() > def c=xy.a.c.text() > println ([b,c].join('\t') ) > ---------------------------------------------------------------------------------- > Now step 1-4 are fine but when i perform step 5 which will load the data > from test table to new table using mapper, it throws the error. The error on > console is > > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.ExecDriver > > I am facing hard time. Any suggestions > Thanks > > On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <[EMAIL PROTECTED]> wrote: >> >> You could load this whole xml file into a table with a single row and a >> single column. The default record delimiter is \n but you can create a table >> where the record delimiter is \001. Once you do that you can follow the >> approach that you described below. Will this solve your problem? >> >> Ashish >> ________________________________ >> From: Shuja Rehman [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, June 09, 2010 3:07 PM >> To: [EMAIL PROTECTED] >> Subject: Load data from xml using Mapper.py in hive >> >> Hi >> I have created a table in hive (Suppose table1 with two columns, col1 and >> col2 ) >> >> now i have an xml file for which i have write a python script which read >> the xml file and transform it in single row with tab seperated >> e.g the output of python script can be >> >> row 1 = val1 val2 >> row2 = val3 val4 >> >> so the output of file has straight rows with the help of python script. >> now i want to load this into created table. I have seen the example of in >> which the data is first loaded in u_data table then transform it using >> python script in u_data_new but in m scenario. it does not fit as i have xml >> file as source. >> >> >> Kindly let me know can I achieve this?? >> Thanks >> >> -- > > -- > Regards > Baig > >
-
Re: Load data from xml using Mapper.py in hiveShuja Rehman 2010-06-10, 11:57
I have changes the logging level according to this command
*bin/hive -hiveconf hive.root.logger=INFO,console * and the outout is ------------------------------------------------------------------------------------------------------------------------------ 10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE TABLE test_new SELECT TRANSFORM (xmlfile) USING 'sampleMapper.groovy' AS (b,c) FROM test 10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source tables 10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize called 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore 10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=test 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for subqueries 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for destination tables 10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=test_new 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c} 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis 10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c} 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3) 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2) 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1) 10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0) 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} 10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation 10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed 10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null), FieldSchema(name:c, type:string, comment:null)], properties:null) 10/06/10 13:51:23 INFO ql.Driver: query plan file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml 10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE test_new SELECT TRANSFORM (xmlfile) USING 'sampleMapper.groovy' AS (b,c) FROM test Total MapReduce jobs = 2 10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2 Launching Job 1 out of 2 10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator 10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to 0 since there's no reduce operator 10/06/10 13:51:24 INFO exec.ExecDriver: Using org.apache.hadoop.hive.ql.io.HiveInputFormat 10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test 10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file hdfs://localhost:9000/user/hive/warehouse/test 10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to process Starting Job = job_201006101118_0009, Tracking URL http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009 10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job job_201006101118_0009, Tracking URL http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009 Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009 10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009 2010-06-10 13:51:32,255 Stage-1 map = 0%, reduce = 0% 10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1 map = 0%, reduce = 0% 2010-06-10 13:51:35,305 Stage-1 map = 50%, reduce = 0% 10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1 map = 50%, reduce = 0% 2010-06-10 13:51:58,505 Stage-1 map = 100%, reduce = 100% 10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1 map = 100%, reduce = 100% Ended Job = job_201006101118_0009 with errors 10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job = job_201006101118_0009 with errors Task with the most failures(4): Task ID: task_201006101118_0009_m_000000 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000 10/06/10 13:51:58 ERROR exec.ExecDriver: Task with the most failures(4): Task ID: task_201006101118_0009_m_000000 URL: http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver 10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver Any clue??? On Thu, Jun 10, 2010 at 1:43 PM, So
-
Re: Load data from xml using Mapper.py in hiveShuja Rehman 2010-06-10, 12:01
and on the link
http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed i have found this output. java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153) ... 4 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400) ... 5 more Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) ... 14 more Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 15 more On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman <[EMAIL PROTECTED]> wrote: > I have changes the logging level according to this command > > *bin/hive -hiveconf hive.root.logger=INFO,console * > > and the outout is > > > ------------------------------------------------------------------------------------------------------------------------------ > 10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE > TABLE test_new > > SELECT > TRANSFORM (xmlfile) > USING 'sampleMapper.groovy' > AS (b,c) > FROM test > 10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed > 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis > 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of > Semantic Analysis > 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source > tables > 10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with > implemenation class:org.apache.hadoop.hive.metastore.ObjectStore > 10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize > called > 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" > requires "org.eclipse.core.resources" but it cannot be resolved. > 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" > requires "org.eclipse.core.runtime" but it cannot be resolved. > 10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" > requires "org.eclipse.text" but it cannot be resolved. > 10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore > 10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default > tbl=test > 10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile} Regards Shuja-ur-Rehman Baig _________________________________ MS CS - School of Science and Engineering Lahore University of Management Sciences (LUMS) Sector U, DHA, Lahore, 54792, Pakistan Cell: +92 3214207445
-
Re: Load data from xml using Mapper.py in hiveShuja Rehman 2010-06-10, 22:54
Hi Ashish
Can you tell me how to create a table using \001 as record delimiter. i am trying according to this *create table test (xmlFile String)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\001' ;* but it giving me the error saying that *ERROR ql.Driver: FAILED: Error in semantic analysis: LINES TERMINATED BY only supports newline '\n' right now* On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <[EMAIL PROTECTED]> wrote: > You could load this whole xml file into a table with a single row and a > single column. The default record delimiter is \n but you can create a table > where the record delimiter is \001. Once you do that you can follow the > approach that you described below. Will this solve your problem? > > Ashish > > ------------------------------ > *From:* Shuja Rehman [mailto:[EMAIL PROTECTED]] > *Sent:* Wednesday, June 09, 2010 3:07 PM > *To:* [EMAIL PROTECTED] > *Subject:* Load data from xml using Mapper.py in hive > > Hi > I have created a table in hive (Suppose table1 with two columns, col1 and > col2 ) > > now i have an xml file for which i have write a python script which read > the xml file and transform it in single row with tab seperated > e.g the output of python script can be > > row 1 = val1 val2 > row2 = val3 val4 > > so the output of file has straight rows with the help of python script. now > i want to load this into created table. I have seen the example of in which > the data is first loaded in u_data table then transform it using python > script in u_data_new but in m scenario. it does not fit as i have xml file > as source. > > > Kindly let me know can I achieve this?? > Thanks > > -- > Regards > Shuja-ur-Rehman Baig > _________________________________ > MS CS - School of Science and Engineering > Lahore University of Management Sciences (LUMS) > Sector U, DHA, Lahore, 54792, Pakistan > Cell: +92 3214207445 > -- Regards Shuja-ur-Rehman Baig _________________________________ MS CS - School of Science and Engineering Lahore University of Management Sciences (LUMS) Sector U, DHA, Lahore, 54792, Pakistan Cell: +92 3214207445
-
Re: Load data from xml using Mapper.py in hiveTomasz Domański 2010-06-11, 15:35
Hi Shuja,
the answer seems to be in lines: Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory Hadoop can't see this file or can't run it. 1. make sure you added file correctly 2. check if hadoop can run script on your hadoop machines Can you run this script in console on hadoop machine like >sampleMapper.groovy or you runn it: > groovy sampleMapper.groovy Mabe you should specify that groovy is needed to run your script. try to change your select into: " ... using 'groovy sampleMapper.groovy' ... " On 10 June 2010 14:01, Shuja Rehman <[EMAIL PROTECTED]> wrote: > and on the link > > http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed > > i have found this output. > > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"} > > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"} > > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153) > ... 4 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator > > at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) > > at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) > > at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) > > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400) > ... 5 more > Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) > at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) > ... 14 more > Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory > > at java.lang.UNIXProcess.(UNIXProcess.java:148) > at java.lang.ProcessImpl.start(ProcessImpl.java:65) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) > ... 15 more > > > > > On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman <[EMAIL PROTECTED]>wrote: > >> I have changes the logging level according to this command >> >> *bin/hive -hiveconf hive.root.logger=INFO,console * >> >> and the outout is >> >> >> ------------------------------------------------------------------------------------------------------------------------------ >> 10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT >> OVERWRITE TABLE test_new >> >> SELECT >> TRANSFORM (xmlfile) >> USING 'sampleMapper.groovy' >> AS (b,c) >> FROM test >> 10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed >> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis >> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of >> Semantic Analysis >> 10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source >> tables >> 10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
-
Re: Load data from xml using Mapper.py in hiveShuja Rehman 2010-06-11, 15:44
Hi Tomasz Domański
Thanks for answer. This problem is solved now. This exception was due to file which was missing before. now the program runs fine if whole xml file is in one line not having (\n). But the actual problem is that hive does not support row terminator other than '\n' according to my research. so the problem i want to load whole xml file into single row and single column so groovy script can have whole xml file as input and then parse it. Please let me know how to do it? Thanks 2010/6/11 Tomasz Domański <[EMAIL PROTECTED]> > Hi Shuja, > > the answer seems to be in lines: > > Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory > > > Hadoop can't see this file or can't run it. > > 1. make sure you added file correctly > 2. check if hadoop can run script on your hadoop machines > > Can you run this script in console on hadoop machine like > > >sampleMapper.groovy > > or you runn it: > > > groovy sampleMapper.groovy > > Mabe you should specify that groovy is needed to run your script. > > try to change your select into: " ... using 'groovy sampleMapper.groovy' > ... " > > > On 10 June 2010 14:01, Shuja Rehman <[EMAIL PROTECTED]> wrote: > >> and on the link >> >> http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed >> >> i have found this output. >> >> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"} >> >> >> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> >> >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"} >> >> >> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417) >> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153) >> ... 4 more >> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator >> >> >> at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319) >> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) >> >> >> at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) >> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) >> >> >> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45) >> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696) >> >> >> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400) >> ... 5 more >> Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory >> >> >> at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) >> at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) >> ... 14 more >> Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory >> >> >> at java.lang.UNIXProcess.(UNIXProcess.java:148) >> at java.lang.ProcessImpl.start(ProcessImpl.java:65) >> at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) >> ... 15 more >> >> >> >> >> On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman <[EMAIL PROTECTED]>wrote: >> >>> I have changes the logging level according to this command >>> >>> *bin/hive -hiveconf hive.root.logger=INFO,console * >>> >>> and the outout is >>> >>> >>> ----------------------- Regards Shuja-ur-Rehman Baig _________________________________ MS CS - School of Science and Engineering Lahore University of Management Sciences (LUMS) Sector U, DHA, Lahore, 54792, Pakistan Cell: +92 3214207445 |