|
Peter Marron
2012-10-31, 17:42
Shreepadma Venugopalan
2012-10-31, 18:37
Peter Marron
2012-10-31, 19:52
Shreepadma Venugopalan
2012-10-31, 21:58
Peter Marron
2012-11-01, 09:31
Dean Wampler
2012-11-01, 13:01
Bejoy KS
2012-11-01, 13:09
Dean Wampler
2012-11-01, 13:20
Peter Marron
2012-11-02, 08:44
Dean Wampler
2012-11-02, 14:02
Peter Marron
2012-11-02, 23:59
Dean Wampler
2012-11-03, 00:02
Dean Wampler
2012-11-03, 00:03
Shreepadma Venugopalan
2012-11-03, 00:06
Peter Marron
2012-11-07, 13:33
Peter Marron
2012-11-23, 08:56
|
-
Creating IndexesPeter Marron 2012-10-31, 17:42
Hi,
I am still having problems building my index. In an attempt to find someone who can help me I'll go through all the steps that I try. 1) First I load my data into hive. hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score; Loading data to table default.score Deleted hdfs://localhost/data/warehouse/score OK Time taken: 7.817 seconds 2) Then I try to create the index hive> CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'; FAILED: Error in metadata: java.lang.RuntimeException: Please specify deferred rebuild using " WITH DEFERRED REBUILD ". FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive> 3) OK, so it suggests that I use "DEFERRED BUILD" and so I do hive> > > CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > WITH DEFERRED REBUILD; OK Time taken: 0.603 seconds 4) Now, to create the index I assume that I use ALTER INDEX as follows: hive>ALTER INDEX bigIndex ON score REBUILD; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 138 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201210311448_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001 Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001 Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 138 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0% 5) This all looks promising, and after increasing my heapsize to get the Map/Reduce to complete, I get this an hour later 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4135.47 sec MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds 470 msec Ended Job = job_201210311448_0001 Loading data to table default.default__score_bigindex__ Deleted hdfs://localhost/data/warehouse/default__score_bigindex__ Invalid alter operation: Unable to alter index. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask So what have I done wrong, and what am I to do to get this index to build successfully? Any help appreciated. Peter Marron From: Peter Marron [mailto:[EMAIL PROTECTED]] Sent: 24 October 2012 13:27 To: [EMAIL PROTECTED] Subject: RE: Indexes Hi Shreepadma, Thanks for this. Looks exactly like the information I need. I was going to reply when I had tried it all out, but I'm having problems creating the index at the moment (I'm getting an OutOfMemoryError at the moment). So I thought that I had better reply now to say thank you. Peter Marron From: Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] Sent: 23 October 2012 19:49 To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Indexes Hi Peter, Indexing support was added to Hive in 0.7 and in 0.8 the query compiler was enhanced to optimized some class of queries (certain group bys and joins) using indexes. Assuming you are using the built in index handler you need to do the following _after_ you have created and rebuilt the index, SET hive.index.compact.file='/tmp/index_result'; SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; You will then notice speed up for a query of the form, select count(*) from tab where indexed_col = some_val Thanks, Shreepadma On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi, I'm very much a Hive newbie but I've been looking at HIVE-417 and this page in particular: http://cwiki.apache.org/confluence/display/Hive/IndexDev Using this information I've been able to create an index (using Hive 0.8.1) and when I look at the contents it all looks very promising indeed. However on the same page there's this comment: "...This document currently only covers index creation and maintenance. A follow-on will explain how indexes are used to optimize queries (building on FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>)...." However I can't find the "follow-on" which tells me how to exploit the index that I've created to "optimize" subsequent queries. Now I've been told that I can create and use indexes with the current release of Hive _without_ writing and developing any Java code of my own. Is this true? If so, how? Any help appreciated. Peter Marron.
-
Re: Creating IndexesShreepadma Venugopalan 2012-10-31, 18:37
Hi Peter,
Can you attach the execution logs? What is the exception that you see in the execution logs? Thanks, Shreepadma On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron < [EMAIL PROTECTED]> wrote: > Hi,**** > > ** ** > > I am still having problems building my index.**** > > In an attempt to find someone who can help me**** > > I’ll go through all the steps that I try.**** > > ** ** > > **1) **First I load my data into hive.**** > > ** ** > > hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;**** > > Loading data to table default.score**** > > Deleted hdfs://localhost/data/warehouse/score**** > > OK**** > > Time taken: 7.817 seconds**** > > ** ** > > **2) **Then I try to create the index **** > > ** ** > > hive> CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';*** > * > > FAILED: Error in metadata: java.lang.RuntimeException: Please specify > deferred rebuild using " WITH DEFERRED REBUILD ".**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > hive> **** > > ** ** > > **3) **OK, so it suggests that I use “DEFERRED BUILD” and so I do**** > > hive> **** > > > **** > > > CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'**** > > > WITH DEFERRED REBUILD;**** > > OK**** > > Time taken: 0.603 seconds**** > > ** ** > > **4) **Now, to create the index I assume that I use ALTER INDEX as > follows:**** > > ** ** > > hive>ALTER INDEX bigIndex ON score REBUILD;**** > > Total MapReduce jobs = 1**** > > Launching Job 1 out of 1**** > > Number of reduce tasks not specified. Estimated from input data size: 138* > *** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201210311448_0001, Tracking URL > http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001**** > > Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job > -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001**** > > Hadoop job information for Stage-1: number of mappers: 511; number of > reducers: 138**** > > 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0%**** > > ** ** > > **5) **This all looks promising, and after increasing my heapsize to > get the Map/Reduce to complete, I get this an hour later**** > > ** ** > > 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 4135.47 sec**** > > MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds > 470 msec**** > > Ended Job = job_201210311448_0001**** > > Loading data to table default.default__score_bigindex__**** > > Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** > > Invalid alter operation: Unable to alter index.**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > ** ** > > So what have I done wrong, and what am I to do to get this index to build > successfully?**** > > ** ** > > Any help appreciated.**** > > ** ** > > Peter Marron**** > > ** ** > > *From:* Peter Marron [mailto:[EMAIL PROTECTED]] > *Sent:* 24 October 2012 13:27 > *To:* [EMAIL PROTECTED] > *Subject:* RE: Indexes**** > > ** ** > > Hi Shreepadma,**** > > ** ** > > Thanks for this. Looks exactly like the information I need.**** > > I was going to reply when I had tried it all out, but I’m having**** > > problems creating the index at the moment (I’m getting an**** > > OutOfMemoryError at the moment). So I thought that I had**** > > better reply now to say thank you.**** > > ** ** > > Peter Marron**** > > ** **
-
RE: Creating IndexesPeter Marron 2012-10-31, 19:52
Hi Shreepadma
Happy to attach the logs, not quite sure which one is going to be most useful. Please find attached one which contained an error of some sort. Not sure it it's related or not to the index error. Found the file in this location: /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog so maybe that will help you locate any other file that you might want to see. Thanks for your efforts. Peter Marron From: Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] Sent: 31 October 2012 18:38 To: [EMAIL PROTECTED] Subject: Re: Creating Indexes Hi Peter, Can you attach the execution logs? What is the exception that you see in the execution logs? Thanks, Shreepadma On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi, I am still having problems building my index. In an attempt to find someone who can help me I'll go through all the steps that I try. 1) First I load my data into hive. hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score; Loading data to table default.score Deleted hdfs://localhost/data/warehouse/score OK Time taken: 7.817 seconds 2) Then I try to create the index hive> CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'; FAILED: Error in metadata: java.lang.RuntimeException: Please specify deferred rebuild using " WITH DEFERRED REBUILD ". FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive> 3) OK, so it suggests that I use "DEFERRED BUILD" and so I do hive> > > CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > WITH DEFERRED REBUILD; OK Time taken: 0.603 seconds 4) Now, to create the index I assume that I use ALTER INDEX as follows: hive>ALTER INDEX bigIndex ON score REBUILD; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 138 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201210311448_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001 Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001 Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 138 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0% 5) This all looks promising, and after increasing my heapsize to get the Map/Reduce to complete, I get this an hour later 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4135.47 sec MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds 470 msec Ended Job = job_201210311448_0001 Loading data to table default.default__score_bigindex__ Deleted hdfs://localhost/data/warehouse/default__score_bigindex__ Invalid alter operation: Unable to alter index. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask So what have I done wrong, and what am I to do to get this index to build successfully? Any help appreciated. Peter Marron From: Peter Marron [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: 24 October 2012 13:27 To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: RE: Indexes Hi Shreepadma, Thanks for this. Looks exactly like the information I need. I was going to reply when I had tried it all out, but I'm having problems creating the index at the moment (I'm getting an OutOfMemoryError at the moment). So I thought that I had better reply now to say thank you. Peter Marron From: Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] Sent: 23 October 2012 19:49 To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Indexes Hi Peter, Indexing support was added to Hive in 0.7 and in 0.8 the query compiler was enhanced to optimized some class of queries (certain group bys and joins) using indexes. Assuming you are using the built in index handler you need to do the following _after_ you have created and rebuilt the index, SET hive.index.compact.file='/tmp/index_result'; SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; You will then notice speed up for a query of the form, select count(*) from tab where indexed_col = some_val Thanks, Shreepadma On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi, I'm very much a Hive newbie but I've been looking at HIVE-417 and this page in particular: http://cwiki.apache.org/confluence/display/Hive/IndexDev Using this information I've been able to create an index (using Hive 0.8.1) and when I look at the contents it all looks very promising indeed. However on the same page there's this comment: "...This document currently only covers index creation and maintenance. A follow-on will explain how indexes are used to optimize queries (building on FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>)...." However I can't find the "follow-on" which tells me how to exploit the index that I've created to "optimize" subsequent queries. Now I've been told that I can create and use indexes with the current release of Hive _without_ writing and developing any Java code of my own. Is this true? If so, how? Any help appreciated. Peter Marron.
-
Re: Creating IndexesShreepadma Venugopalan 2012-10-31, 21:58
Hi Peter,
>From the execution log, java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) It appears that the error is due derby classes not being found. Can you check if the derby jars are present? Thanks, Shreepadma On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron < [EMAIL PROTECTED]> wrote: > Hi Shreepadma**** > > ** ** > > Happy to attach the logs, not quite sure which one is going to**** > > be most useful. Please find attached one which contained an**** > > error of some sort. Not sure it it’s related or not to the index error.*** > * > > Found the file in this location:**** > > ** ** > > > /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog > **** > > ** ** > > so maybe that will help you locate any other file that you might want to > see.**** > > ** ** > > Thanks for your efforts.**** > > ** ** > > Peter Marron**** > > ** ** > > *From:* Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] > *Sent:* 31 October 2012 18:38 > *To:* [EMAIL PROTECTED] > *Subject:* Re: Creating Indexes**** > > ** ** > > Hi Peter,**** > > ** ** > > Can you attach the execution logs? What is the exception that you see in > the execution logs?**** > > ** ** > > Thanks,**** > > Shreepadma **** > > ** ** > > On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron < > [EMAIL PROTECTED]> wrote:**** > > Hi,**** > > **** > > I am still having problems building my index.**** > > In an attempt to find someone who can help me**** > > I’ll go through all the steps that I try.**** > > **** > > 1) First I load my data into hive.**** > > **** > > hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;**** > > Loading data to table default.score**** > > Deleted hdfs://localhost/data/warehouse/score**** > > OK**** > > Time taken: 7.817 seconds**** > > **** > > 2) Then I try to create the index **** > > **** > > hive> CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';*** > * > > FAILED: Error in metadata: java.lang.RuntimeException: Please specify > deferred rebuild using " WITH DEFERRED REBUILD ".**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > hive> **** > > **** > > 3) OK, so it suggests that I use “DEFERRED BUILD” and so I do**** > > hive> **** > > > **** > > > CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)****
-
RE: Creating IndexesPeter Marron 2012-11-01, 09:31
Hi Shreepadma,
I agree that the error looks odd. However I can't believe that I would have got this far with Hive if there was no derby jar. Nevertheless I checked. Here is a directory listing of the Hive install: pmarron@pmarron-ubuntu:/data/hive/lib$ ls ant-contrib-1.0b3.jar commons-pool-1.5.4.jar hive-common-0.8.1.jar hive-shims-0.8.1.jar mockito-all-1.8.2.jar antlr-2.7.7.jar datanucleus-connectionpool-2.0.3.jar hive-contrib-0.8.1.jar javaewah-0.3.jar php antlr-3.0.1.jar datanucleus-core-2.0.3.jar hive_contrib.jar jdo2-api-2.3-ec.jar py antlr-runtime-3.0.1.jar datanucleus-enhancer-2.0.3.jar hive-exec-0.8.1.jar jline-0.9.94.jar slf4j-api-1.6.1.jar asm-3.1.jar datanucleus-rdbms-2.0.3.jar hive-hbase-handler-0.8.1.jar json-20090211.jar slf4j-log4j12-1.6.1.jar commons-cli-1.2.jar derby-10.4.2.0.jar hive-hwi-0.8.1.jar junit-4.10.jar stringtemplate-3.1-b1.jar commons-codec-1.3.jar guava-r06.jar hive-hwi-0.8.1.war libfb303-0.7.0.jar velocity-1.5.jar commons-collections-3.2.1.jar hbase-0.89.0-SNAPSHOT.jar hive-jdbc-0.8.1.jar libfb303.jar zookeeper-3.3.1.jar commons-dbcp-1.4.jar hbase-0.89.0-SNAPSHOT-tests.jar hive-metastore-0.8.1.jar libthrift-0.7.0.jar commons-lang-2.4.jar hive-anttasks-0.8.1.jar hive-pdk-0.8.1.jar libthrift.jar commons-logging-1.0.4.jar hive-builtins-0.8.1.jar hive-serde-0.8.1.jar log4j-1.2.15.jar commons-logging-api-1.0.4.jar hive-cli-0.8.1.jar hive-service-0.8.1.jar log4j-1.2.16.jar Also I found a derby.log in my home directory which I have attached. Regards, Z From: Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] Sent: 31 October 2012 21:58 To: [EMAIL PROTECTED] Subject: Re: Creating Indexes Hi Peter, >From the execution log, java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) It appears that the error is due derby classes not being found. Can you check if the derby jars are present? Thanks, Shreepadma On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Shreepadma Happy to attach the logs, not quite sure which one is going to be most useful. Please find attached one which contained an error of some sort. Not sure it it's related or not to the index error. Found the file in this location: /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog so maybe that will help you locate any other file that you might want to see. Thanks for your efforts. Peter Marron From: Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: 31 October 2012 18:38 To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Creating Indexes Hi Peter, Can you attach the execution logs? What is the exception that you see in the execution logs? Thanks, Shreepadma On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi, I am still having problems building my index. In an attempt to find someone who can help me I'll go through all the steps that I try. 1) First I load my data into hive. hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score; Loading data to table default.score Deleted hdfs://localhost/data/warehouse/score OK Time taken: 7.817 seconds 2) Then I try to create the index hive> CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'; FAILED: Error in metadata: java.lang.RuntimeException: Please specify deferred rebuild using " WITH DEFERRED REBUILD ". FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive> 3) OK, so it suggests that I use "DEFERRED BUILD" and so I do hive> > > CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoo
-
Re: Creating IndexesDean Wampler 2012-11-01, 13:01
It looks like you're using Derby with a real cluster, not just a single
machine in local or pseudo-distributed mode. I haven't tried this myself, but the derby jar is probably not on the machine that ran the reducer task that failed. dean On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron < [EMAIL PROTECTED]> wrote: > Hi Shreepadma,**** > > ** ** > > I agree that the error looks odd. However I can’t believe that I would have > **** > > got this far with Hive if there was no derby jar. Nevertheless I checked.* > *** > > Here is a directory listing of the Hive install:**** > > ** ** > > pmarron@pmarron-ubuntu:/data/hive/lib$ ls**** > > ant-contrib-1.0b3.jar commons-pool-1.5.4.jar > hive-common-0.8.1.jar hive-shims-0.8.1.jar mockito-all-1.8.2.jar* > *** > > antlr-2.7.7.jar datanucleus-connectionpool-2.0.3.jar > hive-contrib-0.8.1.jar javaewah-0.3.jar php**** > > antlr-3.0.1.jar datanucleus-core-2.0.3.jar > hive_contrib.jar jdo2-api-2.3-ec.jar py**** > > antlr-runtime-3.0.1.jar datanucleus-enhancer-2.0.3.jar > hive-exec-0.8.1.jar jline-0.9.94.jar slf4j-api-1.6.1.jar*** > * > > asm-3.1.jar datanucleus-rdbms-2.0.3.jar > hive-hbase-handler-0.8.1.jar json-20090211.jar slf4j-log4j12-1.6.1.jar > **** > > commons-cli-1.2.jar *derby-10.4.2.0.jar* > hive-hwi-0.8.1.jar junit-4.10.jar > stringtemplate-3.1-b1.jar**** > > commons-codec-1.3.jar guava-r06.jar > hive-hwi-0.8.1.war libfb303-0.7.0.jar velocity-1.5.jar**** > > commons-collections-3.2.1.jar hbase-0.89.0-SNAPSHOT.jar > hive-jdbc-0.8.1.jar libfb303.jar > zookeeper-3.3.1.jar**** > > commons-dbcp-1.4.jar hbase-0.89.0-SNAPSHOT-tests.jar > hive-metastore-0.8.1.jar libthrift-0.7.0.jar**** > > commons-lang-2.4.jar hive-anttasks-0.8.1.jar > hive-pdk-0.8.1.jar libthrift.jar**** > > commons-logging-1.0.4.jar hive-builtins-0.8.1.jar > hive-serde-0.8.1.jar log4j-1.2.15.jar**** > > commons-logging-api-1.0.4.jar hive-cli-0.8.1.jar > hive-service-0.8.1.jar log4j-1.2.16.jar**** > > ** ** > > Also I found a derby.log in my home directory which I have attached.**** > > ** ** > > Regards,**** > > ** ** > > Z**** > > ** ** > > *From:* Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] > *Sent:* 31 October 2012 21:58 > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Creating Indexes**** > > ** ** > > Hi Peter,**** > > ** ** > > From the execution log,**** > > ** ** > > java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver**** > > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)**** > > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)**** > > at java.security.AccessController.doPrivileged(Native Method)*** > * > > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)*** > * > > at java.lang.ClassLoader.loadClass(ClassLoader.java:423)**** > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > **** > > at java.lang.ClassLoader.loadClass(ClassLoader.java:356)**** > > at java.lang.Class.forName0(Native Method)**** > > at java.lang.Class.forName(Class.java:186)**** > > at > org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68) > **** > > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778) > **** > > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723) > **** > > at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)**** > > at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)**** > > at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)**** > > at > org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)**** *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
-
Re: Creating IndexesBejoy KS 2012-11-01, 13:09
AFAIK you don't any hive jars on cluster. The hive jars are just required on the client node .
Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Dean Wampler <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2012 08:01:51 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: Creating Indexes It looks like you're using Derby with a real cluster, not just a single machine in local or pseudo-distributed mode. I haven't tried this myself, but the derby jar is probably not on the machine that ran the reducer task that failed. dean On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron < [EMAIL PROTECTED]> wrote: > Hi Shreepadma,**** > > ** ** > > I agree that the error looks odd. However I can’t believe that I would have > **** > > got this far with Hive if there was no derby jar. Nevertheless I checked.* > *** > > Here is a directory listing of the Hive install:**** > > ** ** > > pmarron@pmarron-ubuntu:/data/hive/lib$ ls**** > > ant-contrib-1.0b3.jar commons-pool-1.5.4.jar > hive-common-0.8.1.jar hive-shims-0.8.1.jar mockito-all-1.8.2.jar* > *** > > antlr-2.7.7.jar datanucleus-connectionpool-2.0.3.jar > hive-contrib-0.8.1.jar javaewah-0.3.jar php**** > > antlr-3.0.1.jar datanucleus-core-2.0.3.jar > hive_contrib.jar jdo2-api-2.3-ec.jar py**** > > antlr-runtime-3.0.1.jar datanucleus-enhancer-2.0.3.jar > hive-exec-0.8.1.jar jline-0.9.94.jar slf4j-api-1.6.1.jar*** > * > > asm-3.1.jar datanucleus-rdbms-2.0.3.jar > hive-hbase-handler-0.8.1.jar json-20090211.jar slf4j-log4j12-1.6.1.jar > **** > > commons-cli-1.2.jar *derby-10.4.2.0.jar* > hive-hwi-0.8.1.jar junit-4.10.jar > stringtemplate-3.1-b1.jar**** > > commons-codec-1.3.jar guava-r06.jar > hive-hwi-0.8.1.war libfb303-0.7.0.jar velocity-1.5.jar**** > > commons-collections-3.2.1.jar hbase-0.89.0-SNAPSHOT.jar > hive-jdbc-0.8.1.jar libfb303.jar > zookeeper-3.3.1.jar**** > > commons-dbcp-1.4.jar hbase-0.89.0-SNAPSHOT-tests.jar > hive-metastore-0.8.1.jar libthrift-0.7.0.jar**** > > commons-lang-2.4.jar hive-anttasks-0.8.1.jar > hive-pdk-0.8.1.jar libthrift.jar**** > > commons-logging-1.0.4.jar hive-builtins-0.8.1.jar > hive-serde-0.8.1.jar log4j-1.2.15.jar**** > > commons-logging-api-1.0.4.jar hive-cli-0.8.1.jar > hive-service-0.8.1.jar log4j-1.2.16.jar**** > > ** ** > > Also I found a derby.log in my home directory which I have attached.**** > > ** ** > > Regards,**** > > ** ** > > Z**** > > ** ** > > *From:* Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] > *Sent:* 31 October 2012 21:58 > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Creating Indexes**** > > ** ** > > Hi Peter,**** > > ** ** > > From the execution log,**** > > ** ** > > java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver**** > > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)**** > > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)**** > > at java.security.AccessController.doPrivileged(Native Method)*** > * > > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)*** > * > > at java.lang.ClassLoader.loadClass(ClassLoader.java:423)**** > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > **** > > at java.lang.ClassLoader.loadClass(ClassLoader.java:356)**** > > at java.lang.Class.forName0(Native Method)**** > > at java.lang.Class.forName(Class.java:186)**** > > at > org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68) > **** > > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778) > **** > > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723) *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
-
Re: Creating IndexesDean Wampler 2012-11-01, 13:20
That's what I would expect, too, but it appears a reducer task wanted to
update some SQL statistics and to do that it wanted to load the derby jar. On Thu, Nov 1, 2012 at 8:09 AM, Bejoy KS <[EMAIL PROTECTED]> wrote: > ** > AFAIK you don't any hive jars on cluster. The hive jars are just required > on the client node . > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > *From: * Dean Wampler <[EMAIL PROTECTED]> > *Date: *Thu, 1 Nov 2012 08:01:51 -0500 > *To: *<[EMAIL PROTECTED]> > *ReplyTo: * [EMAIL PROTECTED] > *Subject: *Re: Creating Indexes > > It looks like you're using Derby with a real cluster, not just a single > machine in local or pseudo-distributed mode. I haven't tried this myself, > but the derby jar is probably not on the machine that ran the reducer task > that failed. > > dean > > On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron < > [EMAIL PROTECTED]> wrote: > >> Hi Shreepadma,**** >> >> ** ** >> >> I agree that the error looks odd. However I can’t believe that I would >> have**** >> >> got this far with Hive if there was no derby jar. Nevertheless I checked. >> **** >> >> Here is a directory listing of the Hive install:**** >> >> ** ** >> >> pmarron@pmarron-ubuntu:/data/hive/lib$ ls**** >> >> ant-contrib-1.0b3.jar commons-pool-1.5.4.jar >> hive-common-0.8.1.jar hive-shims-0.8.1.jar mockito-all-1.8.2.jar >> **** >> >> antlr-2.7.7.jar datanucleus-connectionpool-2.0.3.jar >> hive-contrib-0.8.1.jar javaewah-0.3.jar php**** >> >> antlr-3.0.1.jar datanucleus-core-2.0.3.jar >> hive_contrib.jar jdo2-api-2.3-ec.jar py**** >> >> antlr-runtime-3.0.1.jar datanucleus-enhancer-2.0.3.jar >> hive-exec-0.8.1.jar jline-0.9.94.jar slf4j-api-1.6.1.jar** >> ** >> >> asm-3.1.jar datanucleus-rdbms-2.0.3.jar >> hive-hbase-handler-0.8.1.jar json-20090211.jar slf4j-log4j12-1.6.1.jar >> **** >> >> commons-cli-1.2.jar *derby-10.4.2.0.jar* >> hive-hwi-0.8.1.jar junit-4.10.jar >> stringtemplate-3.1-b1.jar**** >> >> commons-codec-1.3.jar guava-r06.jar >> hive-hwi-0.8.1.war libfb303-0.7.0.jar velocity-1.5.jar**** >> >> commons-collections-3.2.1.jar hbase-0.89.0-SNAPSHOT.jar >> hive-jdbc-0.8.1.jar libfb303.jar >> zookeeper-3.3.1.jar**** >> >> commons-dbcp-1.4.jar hbase-0.89.0-SNAPSHOT-tests.jar >> hive-metastore-0.8.1.jar libthrift-0.7.0.jar**** >> >> commons-lang-2.4.jar hive-anttasks-0.8.1.jar >> hive-pdk-0.8.1.jar libthrift.jar**** >> >> commons-logging-1.0.4.jar hive-builtins-0.8.1.jar >> hive-serde-0.8.1.jar log4j-1.2.15.jar**** >> >> commons-logging-api-1.0.4.jar hive-cli-0.8.1.jar >> hive-service-0.8.1.jar log4j-1.2.16.jar**** >> >> ** ** >> >> Also I found a derby.log in my home directory which I have attached.**** >> >> ** ** >> >> Regards,**** >> >> ** ** >> >> Z**** >> >> ** ** >> >> *From:* Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] >> *Sent:* 31 October 2012 21:58 >> >> *To:* [EMAIL PROTECTED] >> *Subject:* Re: Creating Indexes**** >> >> ** ** >> >> Hi Peter,**** >> >> ** ** >> >> From the execution log,**** >> >> ** ** >> >> java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver*** >> * >> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)**** >> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)**** >> >> at java.security.AccessController.doPrivileged(Native Method)** >> ** >> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)** >> ** >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)**** >> >> at >> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)**** >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)**** >> >> at java.lang.Class.forName0(Native Method)**** *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
-
RE: Creating IndexesPeter Marron 2012-11-02, 08:44
Hi Dean,
I'm running everything on a single physical machine in pseudo-distributed mode. Well it certainly looks like the reducer is looking for a derby.jar, although I must confess I don't really understand why it would be doing that. In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into the Hadoop directory, where I assume that the reducer would be able to find it. However I get exactly the same problem as before. Is there some particular place that I should put the derby.jar to make this problem go away? Is there anything else that I can try? Peter Marron From: Dean Wampler [mailto:[EMAIL PROTECTED]] Sent: 01 November 2012 13:02 To: [EMAIL PROTECTED] Subject: Re: Creating Indexes It looks like you're using Derby with a real cluster, not just a single machine in local or pseudo-distributed mode. I haven't tried this myself, but the derby jar is probably not on the machine that ran the reducer task that failed. dean On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Shreepadma, I agree that the error looks odd. However I can't believe that I would have got this far with Hive if there was no derby jar. Nevertheless I checked. Here is a directory listing of the Hive install: [snip]
-
Re: Creating IndexesDean Wampler 2012-11-02, 14:02
Oh, I saw this line in your Hive output and just assumed you were running
in a cluster: Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 138**** I haven't tried running a job that big in pseudodistributed mode either, but that's beside the point. So it seems to be an issue with indexing, but it still begs the question why derby isn't on the classpath for the task. I would try using the ADD JAR command, which copies the jar around the "cluster" and puts it on the classpath. It's what you would use with UDFs, for example: ADD JAR /path/to/derby.jar ALTER INDEX ...; It's a huge hack, but it just might work. dean On Fri, Nov 2, 2012 at 3:44 AM, Peter Marron < [EMAIL PROTECTED]> wrote: > Hi Dean,**** > > ** ** > > I’m running everything on a single physical machine in pseudo-distributed > mode.**** > > ** ** > > Well it certainly looks like the reducer is looking for a derby.jar, > although I must**** > > confess I don’t really understand why it would be doing that.**** > > In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into > the**** > > Hadoop directory, where I assume that the reducer would be able to find it. > **** > > However I get exactly the same problem as before.**** > > Is there some particular place that I should put the derby.jar to make this > **** > > problem go away? Is there anything else that I can try?**** > > ** ** > > Peter Marron**** > > ** ** > > *From:* Dean Wampler [mailto:[EMAIL PROTECTED]] > *Sent:* 01 November 2012 13:02 > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Creating Indexes**** > > ** ** > > It looks like you're using Derby with a real cluster, not just a single > machine in local or pseudo-distributed mode. I haven't tried this myself, > but the derby jar is probably not on the machine that ran the reducer task > that failed.**** > > ** ** > > dean**** > > On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron < > [EMAIL PROTECTED]> wrote:**** > > Hi Shreepadma,**** > > **** > > I agree that the error looks odd. However I can’t believe that I would have > **** > > got this far with Hive if there was no derby jar. Nevertheless I checked.* > *** > > Here is a directory listing of the Hive install:**** > > [snip]**** > > ** ** > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
-
RE: Creating IndexesPeter Marron 2012-11-02, 23:59
Hi Dean,
At this stage I'm really not worried about this being a hack. I just want to get it to work, and I'm grateful for all your help. I did as you suggested and now, as far as I can see, the Map/Reduce has succeeded. When I look in the log for the last reduce I no longer find an error. However this is the output from the hive command session: MapReduce Total cumulative CPU time: 0 days 1 hours 14 minutes 51 seconds 360 msec Ended Job = job_201211021743_0001 Loading data to table default.default__score_bigindex__ Deleted hdfs://localhost/data/warehouse/default__score_bigindex__ Invalid alter operation: Unable to alter index. Table default.default__score_bigindex__ stats: [num_partitions: 0, num_files: 138, num_rows: 0, total_size: 446609024, raw_data_size: 0] FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Map: 511 Reduce: 138 Accumulative CPU: 4491.36 sec HDFS Read: 137123460712 HDFS Write: 446609024 SUCESS Total MapReduce CPU Time Spent: 0 days 1 hours 14 minutes 51 seconds 360 msec hive> I find this very confusing. We have the bit where it says "Job 0:.... SUCCESS" and this seems to fit with the fact that I can't find errors in the Map/Reduce. On the other hand we have the bit where it says: "Invalid alter operation: Unable to alter index." So has it successfully created the index or not? And if not, then what do I do next? Is there somewhere else where it records Hive errors as opposed to Map/Reduce errors? Regards, Peter Marron From: Dean Wampler [mailto:[EMAIL PROTECTED]] Sent: 02 November 2012 14:03 To: [EMAIL PROTECTED] Subject: Re: Creating Indexes Oh, I saw this line in your Hive output and just assumed you were running in a cluster: Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 138 I haven't tried running a job that big in pseudodistributed mode either, but that's beside the point. So it seems to be an issue with indexing, but it still begs the question why derby isn't on the classpath for the task. I would try using the ADD JAR command, which copies the jar around the "cluster" and puts it on the classpath. It's what you would use with UDFs, for example: ADD JAR /path/to/derby.jar ALTER INDEX ...; It's a huge hack, but it just might work. dean On Fri, Nov 2, 2012 at 3:44 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Dean, I'm running everything on a single physical machine in pseudo-distributed mode. Well it certainly looks like the reducer is looking for a derby.jar, although I must confess I don't really understand why it would be doing that. In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into the Hadoop directory, where I assume that the reducer would be able to find it. However I get exactly the same problem as before. Is there some particular place that I should put the derby.jar to make this problem go away? Is there anything else that I can try? Peter Marron From: Dean Wampler [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: 01 November 2012 13:02 To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Creating Indexes It looks like you're using Derby with a real cluster, not just a single machine in local or pseudo-distributed mode. I haven't tried this myself, but the derby jar is probably not on the machine that ran the reducer task that failed. dean On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Shreepadma, I agree that the error looks odd. However I can't believe that I would have got this far with Hive if there was no derby jar. Nevertheless I checked. Here is a directory listing of the Hive install: [snip] -- Dean Wampler, Ph.D. thinkbiganalytics.com<http://thinkbiganalytics.com> +1-312-339-1330
-
Re: Creating IndexesDean Wampler 2012-11-03, 00:02
Wow. Lots of quirks. I'm glad the ADD JAR
On Fri, Nov 2, 2012 at 6:59 PM, Peter Marron < [EMAIL PROTECTED]> wrote: > Hi Dean,**** > > ** ** > > At this stage I’m really not worried about this being a hack.**** > > I just want to get it to work, and I’m grateful for all your help.**** > > I did as you suggested and now, as far as I can see, the Map/Reduce**** > > has succeeded. When I look in the log for the last reduce I no longer**** > > find an error. However this is the output from the hive command**** > > session:**** > > ** ** > > MapReduce Total cumulative CPU time: 0 days 1 hours 14 minutes 51 seconds > 360 msec**** > > Ended Job = job_201211021743_0001**** > > Loading data to table default.default__score_bigindex__**** > > Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** > > Invalid alter operation: Unable to alter index.**** > > Table default.default__score_bigindex__ stats: [num_partitions: 0, > num_files: 138, num_rows: 0, total_size: 446609024, raw_data_size: 0]**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > MapReduce Jobs Launched: **** > > Job 0: Map: 511 Reduce: 138 Accumulative CPU: 4491.36 sec HDFS Read: > 137123460712 HDFS Write: 446609024 SUCESS**** > > Total MapReduce CPU Time Spent: 0 days 1 hours 14 minutes 51 seconds 360 > msec**** > > hive> **** > > ** ** > > I find this very confusing. We have the bit where it says “Job 0:…. > SUCCESS”**** > > and this seems to fit with the fact that I can’t find errors in the > Map/Reduce.**** > > On the other hand we have the bit where it says: “Invalid alter operation: > Unable to alter index.”**** > > So has it successfully created the index or not? And if not, then what do > I do next?**** > > Is there somewhere else where it records Hive errors as opposed to > Map/Reduce errors?**** > > ** ** > > Regards,**** > > ** ** > > Peter Marron**** > > **** > > ** ** > > *From:* Dean Wampler [mailto:[EMAIL PROTECTED]] > *Sent:* 02 November 2012 14:03 > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Creating Indexes**** > > ** ** > > Oh, I saw this line in your Hive output and just assumed you were running > in a cluster:**** > > ** ** > > Hadoop job information for Stage-1: number of mappers: 511; number of > reducers: 138**** > > ** ** > > I haven't tried running a job that big in pseudodistributed mode either, > but that's beside the point.**** > > ** ** > > So it seems to be an issue with indexing, but it still begs the question > why derby isn't on the classpath for the task. I would try using the ADD > JAR command, which copies the jar around the "cluster" and puts it on the > classpath. It's what you would use with UDFs, for example:**** > > ** ** > > ADD JAR /path/to/derby.jar**** > > ALTER INDEX ...;**** > > ** ** > > It's a huge hack, but it just might work.**** > > dean**** > > ** ** > > On Fri, Nov 2, 2012 at 3:44 AM, Peter Marron < > [EMAIL PROTECTED]> wrote:**** > > Hi Dean,**** > > **** > > I’m running everything on a single physical machine in pseudo-distributed > mode.**** > > **** > > Well it certainly looks like the reducer is looking for a derby.jar, > although I must**** > > confess I don’t really understand why it would be doing that.**** > > In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into > the**** > > Hadoop directory, where I assume that the reducer would be able to find it. > **** > > However I get exactly the same problem as before.**** > > Is there some particular place that I should put the derby.jar to make this > **** > > problem go away? Is there anything else that I can try?**** > > **** > > Peter Marron**** > > **** > > *From:* Dean Wampler [mailto:[EMAIL PROTECTED]] > *Sent:* 01 November 2012 13:02**** > > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Creating Indexes**** > > **** > > It looks like you're using Derby with a real cluster, not just a single *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
-
Re: Creating IndexesDean Wampler 2012-11-03, 00:03
Hit send too soon...
I'm glad the ADD JAR hack appeared to work. You might verify if the temporary files mentioned are still there and also verify that you have write permissions for the target index directories. Other than that, I'm not sure what to suggest. I haven't really used indexing much, but it appears to be an immature feature, unfortunately. dean On Fri, Nov 2, 2012 at 7:02 PM, Dean Wampler < [EMAIL PROTECTED]> wrote: > Wow. Lots of quirks. I'm glad the ADD JAR > > > On Fri, Nov 2, 2012 at 6:59 PM, Peter Marron < > [EMAIL PROTECTED]> wrote: > >> Hi Dean,**** >> >> ** ** >> >> At this stage I’m really not worried about this being a hack.**** >> >> I just want to get it to work, and I’m grateful for all your help.**** >> >> I did as you suggested and now, as far as I can see, the Map/Reduce**** >> >> has succeeded. When I look in the log for the last reduce I no longer**** >> >> find an error. However this is the output from the hive command**** >> >> session:**** >> >> ** ** >> >> MapReduce Total cumulative CPU time: 0 days 1 hours 14 minutes 51 seconds >> 360 msec**** >> >> Ended Job = job_201211021743_0001**** >> >> Loading data to table default.default__score_bigindex__**** >> >> Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** >> >> Invalid alter operation: Unable to alter index.**** >> >> Table default.default__score_bigindex__ stats: [num_partitions: 0, >> num_files: 138, num_rows: 0, total_size: 446609024, raw_data_size: 0]**** >> >> FAILED: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.DDLTask**** >> >> MapReduce Jobs Launched: **** >> >> Job 0: Map: 511 Reduce: 138 Accumulative CPU: 4491.36 sec HDFS Read: >> 137123460712 HDFS Write: 446609024 SUCESS**** >> >> Total MapReduce CPU Time Spent: 0 days 1 hours 14 minutes 51 seconds 360 >> msec**** >> >> hive> **** >> >> ** ** >> >> I find this very confusing. We have the bit where it says “Job 0:…. >> SUCCESS”**** >> >> and this seems to fit with the fact that I can’t find errors in the >> Map/Reduce.**** >> >> On the other hand we have the bit where it says: “Invalid alter >> operation: Unable to alter index.”**** >> >> So has it successfully created the index or not? And if not, then what >> do I do next?**** >> >> Is there somewhere else where it records Hive errors as opposed to >> Map/Reduce errors?**** >> >> ** ** >> >> Regards,**** >> >> ** ** >> >> Peter Marron**** >> >> **** >> >> ** ** >> >> *From:* Dean Wampler [mailto:[EMAIL PROTECTED]] >> *Sent:* 02 November 2012 14:03 >> >> *To:* [EMAIL PROTECTED] >> *Subject:* Re: Creating Indexes**** >> >> ** ** >> >> Oh, I saw this line in your Hive output and just assumed you were running >> in a cluster:**** >> >> ** ** >> >> Hadoop job information for Stage-1: number of mappers: 511; number of >> reducers: 138**** >> >> ** ** >> >> I haven't tried running a job that big in pseudodistributed mode either, >> but that's beside the point.**** >> >> ** ** >> >> So it seems to be an issue with indexing, but it still begs the question >> why derby isn't on the classpath for the task. I would try using the ADD >> JAR command, which copies the jar around the "cluster" and puts it on the >> classpath. It's what you would use with UDFs, for example:**** >> >> ** ** >> >> ADD JAR /path/to/derby.jar**** >> >> ALTER INDEX ...;**** >> >> ** ** >> >> It's a huge hack, but it just might work.**** >> >> dean**** >> >> ** ** >> >> On Fri, Nov 2, 2012 at 3:44 AM, Peter Marron < >> [EMAIL PROTECTED]> wrote:**** >> >> Hi Dean,**** >> >> **** >> >> I’m running everything on a single physical machine in pseudo-distributed >> mode.**** >> >> **** >> >> Well it certainly looks like the reducer is looking for a derby.jar, >> although I must**** >> >> confess I don’t really understand why it would be doing that.**** >> >> In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
-
Re: Creating IndexesShreepadma Venugopalan 2012-11-03, 00:06
Hi Peter,
While it looks like the map-red task may have succeeded it looks like the alter index actually failed. You should look into the execution log to see what the exception is. Without knowing why the DDLtask failed its hard to pinpoint the problem. As for the original problem with the jar as Dean pointed out for some odd reason the jar was not on the classpath prior to the add jar. Thanks, Shreepadma On Fri, Nov 2, 2012 at 4:59 PM, Peter Marron < [EMAIL PROTECTED]> wrote: > Hi Dean,**** > > ** ** > > At this stage I’m really not worried about this being a hack.**** > > I just want to get it to work, and I’m grateful for all your help.**** > > I did as you suggested and now, as far as I can see, the Map/Reduce**** > > has succeeded. When I look in the log for the last reduce I no longer**** > > find an error. However this is the output from the hive command**** > > session:**** > > ** ** > > MapReduce Total cumulative CPU time: 0 days 1 hours 14 minutes 51 seconds > 360 msec**** > > Ended Job = job_201211021743_0001**** > > Loading data to table default.default__score_bigindex__**** > > Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** > > Invalid alter operation: Unable to alter index.**** > > Table default.default__score_bigindex__ stats: [num_partitions: 0, > num_files: 138, num_rows: 0, total_size: 446609024, raw_data_size: 0]**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > MapReduce Jobs Launched: **** > > Job 0: Map: 511 Reduce: 138 Accumulative CPU: 4491.36 sec HDFS Read: > 137123460712 HDFS Write: 446609024 SUCESS**** > > Total MapReduce CPU Time Spent: 0 days 1 hours 14 minutes 51 seconds 360 > msec**** > > hive> **** > > ** ** > > I find this very confusing. We have the bit where it says “Job 0:…. > SUCCESS”**** > > and this seems to fit with the fact that I can’t find errors in the > Map/Reduce.**** > > On the other hand we have the bit where it says: “Invalid alter operation: > Unable to alter index.”**** > > So has it successfully created the index or not? And if not, then what do > I do next?**** > > Is there somewhere else where it records Hive errors as opposed to > Map/Reduce errors?**** > > ** ** > > Regards,**** > > ** ** > > Peter Marron**** > > **** > > ** ** > > *From:* Dean Wampler [mailto:[EMAIL PROTECTED]] > *Sent:* 02 November 2012 14:03 > > *To:* [EMAIL PROTECTED] > *Subject:* Re: Creating Indexes**** > > ** ** > > Oh, I saw this line in your Hive output and just assumed you were running > in a cluster:**** > > ** ** > > Hadoop job information for Stage-1: number of mappers: 511; number of > reducers: 138**** > > ** ** > > I haven't tried running a job that big in pseudodistributed mode either, > but that's beside the point.**** > > ** ** > > So it seems to be an issue with indexing, but it still begs the question > why derby isn't on the classpath for the task. I would try using the ADD > JAR command, which copies the jar around the "cluster" and puts it on the > classpath. It's what you would use with UDFs, for example:**** > > ** ** > > ADD JAR /path/to/derby.jar**** > > ALTER INDEX ...;**** > > ** ** > > It's a huge hack, but it just might work.**** > > dean**** > > ** ** > > On Fri, Nov 2, 2012 at 3:44 AM, Peter Marron < > [EMAIL PROTECTED]> wrote:**** > > Hi Dean,**** > > **** > > I’m running everything on a single physical machine in pseudo-distributed > mode.**** > > **** > > Well it certainly looks like the reducer is looking for a derby.jar, > although I must**** > > confess I don’t really understand why it would be doing that.**** > > In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into > the**** > > Hadoop directory, where I assume that the reducer would be able to find it. > **** > > However I get exactly the same problem as before.**** > > Is there some particular place that I should put the derby.jar to make this
-
RE: Creating IndexesPeter Marron 2012-11-07, 13:33
Hi Shreepadma,
I have looked and I can't find anything that looks like a log with any more information in it. The vast bulk of logs that I find seem to concern the Map/Reduce which we agree has succeeded. The only thing that looks a little odd is in the syslogs for the reducers. All 138 of them seem to feature a line like this: 2012-11-07 11:10:17,960 ERROR org.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error: cannot connect to database I don't know if this is a serious problem or not or whether it might have bearing on the subsequent Hive error? I have attached the last reducer syslog so that you can see this problem in context. Also I managed to find a hive.log, attached. It seems to have lots of warnings and "errors" but I don't know if any of them are relevant. I'm being pressed to make some progress on this problem, but I don't know what to do next. Is there any other files that I can provide that might help? Do I need to add some instrumentation? I guess that I could re-build various Hive class files to get more information. But I really don't know Hive at all and so I suspect that it'll take me ages flailing around before I find anything helpful. Any suggestions for a way forward gratefully received, I'm happy to make any changes that might help. Regards, Z From: Shreepadma Venugopalan [mailto:[EMAIL PROTECTED]] Sent: 03 November 2012 00:06 To: [EMAIL PROTECTED] Subject: Re: Creating Indexes Hi Peter, While it looks like the map-red task may have succeeded it looks like the alter index actually failed. You should look into the execution log to see what the exception is. Without knowing why the DDLtask failed its hard to pinpoint the problem. As for the original problem with the jar as Dean pointed out for some odd reason the jar was not on the classpath prior to the add jar. Thanks, Shreepadma On Fri, Nov 2, 2012 at 4:59 PM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Dean, At this stage I'm really not worried about this being a hack. I just want to get it to work, and I'm grateful for all your help. I did as you suggested and now, as far as I can see, the Map/Reduce has succeeded. When I look in the log for the last reduce I no longer find an error. However this is the output from the hive command session: MapReduce Total cumulative CPU time: 0 days 1 hours 14 minutes 51 seconds 360 msec Ended Job = job_201211021743_0001 Loading data to table default.default__score_bigindex__ Deleted hdfs://localhost/data/warehouse/default__score_bigindex__ Invalid alter operation: Unable to alter index. Table default.default__score_bigindex__ stats: [num_partitions: 0, num_files: 138, num_rows: 0, total_size: 446609024, raw_data_size: 0] FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Map: 511 Reduce: 138 Accumulative CPU: 4491.36 sec HDFS Read: 137123460712 HDFS Write: 446609024 SUCESS Total MapReduce CPU Time Spent: 0 days 1 hours 14 minutes 51 seconds 360 msec hive> I find this very confusing. We have the bit where it says "Job 0:.... SUCCESS" and this seems to fit with the fact that I can't find errors in the Map/Reduce. On the other hand we have the bit where it says: "Invalid alter operation: Unable to alter index." So has it successfully created the index or not? And if not, then what do I do next? Is there somewhere else where it records Hive errors as opposed to Map/Reduce errors? Regards, Peter Marron From: Dean Wampler [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: 02 November 2012 14:03 To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Creating Indexes Oh, I saw this line in your Hive output and just assumed you were running in a cluster: Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 138 I haven't tried running a job that big in pseudodistributed mode either, but that's beside the point. So it seems to be an issue with indexing, but it still begs the question why derby isn't on the classpath for the task. I would try using the ADD JAR command, which copies the jar around the "cluster" and puts it on the classpath. It's what you would use with UDFs, for example: ADD JAR /path/to/derby.jar ALTER INDEX ...; It's a huge hack, but it just might work. dean On Fri, Nov 2, 2012 at 3:44 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Dean, I'm running everything on a single physical machine in pseudo-distributed mode. Well it certainly looks like the reducer is looking for a derby.jar, although I must confess I don't really understand why it would be doing that. In an effort to fix that I copied the derby.jar (derby-10.4.2.0.jar) into the Hadoop directory, where I assume that the reducer would be able to find it. However I get exactly the same problem as before. Is there some particular place that I should put the derby.jar to make this problem go away? Is there anything else that I can try? Peter Marron From: Dean Wampler [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: 01 November 2012 13:02 To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: Creating Indexes It looks like you're using Derby with a real cluster, not just a single machine in local or pseudo-distributed mode. I haven't tried this myself, but the derby jar is probably not on the machine that ran the reducer task that failed. dean On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Shreepadma, I agree that the error looks odd. However I can't believe that I would have got this far with Hive if there was no derby jar. Nevertheless I checked. Here is a directory listing of the Hive install: [snip] Dean Wampler, Ph.D. thinkbiganalytics.com<http://thi
-
Creating indexesPeter Marron 2012-11-23, 08:56
(Sorry previous incomplete message sent in error.)
Hi, I'm trying to create indexes in Hive, and I've switched to using CDH-4. The creation of the index is failing and it's pretty obvious that the reducers are running out of heap space. When I use the web interface for the "Hadoop reduce task list" I can find this entry: Error: Java heap space Error: GC overhead limit exceeded org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: File exists at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:178) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:303) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:376) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: EEXIST: File exists at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:172) ... 7 more Error: GC overhead limit exceeded Also when the job is running I can look at the administration page and I can see a line that says something like: Cluster Summary (Heap Size is 52.12 MB/380.5 MB) And it seems likely that 380Mb is not enough to run this job. The thing is that I have tried to set the heap to a higher value. I've searched in the configuration interface for "heap" and set the values higher. This is what I see: [cid:[EMAIL PROTECTED]AD20] What am I doing wrong? If this e-mail shouldn't be here and should only be on a cloudera mailing list, please re-direct me. Thanks in advance. Peter Marron Trillium Software UK Limited Tel : +44 (0) 118 940 7609 Fax : +44 (0) 118 940 7699 E: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> |