Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Question about how to add the debug info into the hive core jar


+
java8964 java8964 2013-03-20, 20:45
+
Abdelrhman Shettia 2013-03-21, 00:35
Copy link to this message
-
RE: Question about how to add the debug info into the hive core jar
java8964 java8964 2013-03-21, 01:05

I am not sure the existing logging information is enough for me.
The exception trace is as following:
Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8at java.util.ArrayList.rangeCheck(ArrayList.java:604)at java.util.ArrayList.get(ArrayList.java:382)at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
It is hive 0.9.0, and I look into the source code of LazySImpleSerDe.java around line 485:
      List<? extends StructField> fields = soi.getAllStructFieldRefs();      list = soi.getStructFieldsDataAsList(obj);      if (list == null) {        out.write(nullSequence.getBytes(), 0, nullSequence.getLength());      } else {        for (int i = 0; i < list.size(); i++) {          if (i > 0) {            out.write(separator);          }             serialize(out, list.get(i), fields.get(i).getFieldObjectInspector(),                     -- line 485              separators, level + 1, nullSequence, escaped, escapeChar,              needsEscape);        }         }  
For this exception to happen, it means that the soi (Which is my StructObjectInspector class) must return different length of collection object as "fields" and "list".But I already add the logger in my StructorObjectInspector, which proves the same length collection returned from both method of getAllStructFieldRefs() and getStructFiledsDataAsList(Object).So I really don't know how this exception could happen in the Hive code.
I have 2 options right now:1) Change the above code to add more debug information to return at runtime to check what kind of content in the either "fields" object or "list" object, to understand why their length not same. But I have problem to make my new jar to be loaded by hadoop.2) Enable remote debug. There is very limited example on the internet about how to enable the hive server side MR jobs remote debug, even some wiki pages claim it is doable, but without concrete examples.
Thanks

From: [EMAIL PROTECTED]
Subject: Re: Question about how to add the debug info into the hive core jar
Date: Wed, 20 Mar 2013 17:35:36 -0700
To: [EMAIL PROTECTED]

Hi Yong,
Have you tried running the H query in debug mode. Hive log level can be changed by passing the following conf while hive client is running.  #hive -hiveconf hive.root.logger=ALL,console -e " DDL statement ;"#hive -hiveconf hive.root.logger=ALL,console -f ddl.sql ;   Hope this helps
 Thanks

On Mar 20, 2013, at 1:45 PM, java8964 java8964 <[EMAIL PROTECTED]> wrote:Hi,
I have the hadoop running in  pseudo-distributed mode on my linux box. Right now I face a problem about a Hive, which throws Exception in a table for some data which used my custom SerDe and InputFormat class.
To help me to trace the root cause, I need to modify the code of org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe to add more debug logging information to understand why the exception happens.
After I modify the hive code, I can compile it and generate a new hive-serde.jar file, with the same name as the release version, just size changed.
Now I put my new hive-serde.jar under $HIVE_HOME/lib folder, replace the old one, and run the query which failed. But after the failure, if I check the $HADOOP_HOME/logs/user_logs/, I saw the Exception stacktrace still looked like generated by the original hive-serde class. The reason is that the line number shown in the log doesn't match with the new code I changed to add the debug information.
My question is, if I have this new compiled hive-serde.jar file, besides $HIVE_HOME/lib, where should I put it in?
1) This is a pseudo environments. Everything (namenode, data node, job tracker and tasktracer are all running in one box)2) After I replace hive-serde.jar with my new jar, I even stop all the hadoop java processing and restart them.3) But when I run the query in the hive session, I still saw the log generated by the old hive-serde.jar class. Why?
Thank for any help
Yong