Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> generating ORC file as output of a mapreduce job


Copy link to this message
-
generating ORC file as output of a mapreduce job
Hi,
I am writing a MR job to generate data for Hive.

the code generates output with Text format pretty OK

job.setOutputKeyClass(NullWritable.class);

job.setOutputValueClass(Text.class);
But when I change the value class from Text.class to OrcOutputFormat.class,
it throw exception
2013-11-20 00:50:50,613 FATAL [main]
org.apache.hadoop.mapred.YarnChild: Error running child :
java.lang.VerifyError: class
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcRequestHeaderProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.apache.hadoop.util.ProtoUtil.makeRpcRequestHeader(ProtoUtil.java:165)
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:362)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1389)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at sun.proxy.$Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:133)

My objective is generating ORC file as output a MR job, so that I can
load data into Hive directly. If other approach also serve the same
objective, that will be nice. Is there any HCatlog utility I can use
do it ?
Thanks a lot,

Johnny