java8964 java8964 2012-12-21, 01:01
Mark Grover 2012-12-22, 00:20
-RE: hive add jar question
java8964 java8964 2012-12-22, 01:59
Thanks for your response. Maybe my question is not clear.
The properties file is in the jar, and also in the top level. I put the properties in the jar files before, in all kinds of Java project, not problem ever.
But in hive, it will complain at run time that the it is NULL for "this.getClass().getResourceAsStream("my.properties")". I am not sure if Hive creates its own classloader for "add jar", but not matter what, the Java class file is in the same jar as the "my.properties", but somehow in the MR jobs generated by HIVE will return NULL for "this.getClass().getResourceAsStream("my.properties")". Very puzzle.
For 2nd behavior, yes, I can produce it again and again. Not sure if it is related to that I put these lines in my .hiverc file:
set hive.exec.mode.local.auto=true;set hive.exec.parallel=true;
I will try to see if I still can produce this without these lines. Right now, I have to put my custom jar in $HIVE_HOME/lib to make the local-running mode works too.
> Date: Fri, 21 Dec 2012 16:20:49 -0800
> Subject: Re: hive add jar question
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Here is a relevant thread:
> http://mail-archives.apache.org/mod_mbox/hive-user/201008.mbox/%3CAANLkTi=+[EMAIL PROTECTED]%3E
> I have personally used the "add file" functionality when accessing
> resources. You can access them just by their name in your code.
> About #2, doesn't sound normal to me. Did you figure that out or still
> running into it?
> On Thu, Dec 20, 2012 at 5:01 PM, java8964 java8964 <[EMAIL PROTECTED]> wrote:
> > Hi, I have 2 questions related to the hive behavior when using 'add jar'.
> > I am testing the implementing of my own Hive InputFormat and SerDe in a jar
> > in my single machine cluster running in Pseudo distributed mode. In the jar,
> > I will include the properties file in the top level of the jar.
> > In my custom code, I will try to load the properties file through the
> > following way:
> > props.load(this.getClass().getResourceAsStream("my.properties")).
> > I am sure that the my.properties exists in the my jar file, but
> > this.getClass().getResourceAsStream("my.properties") at runtime will return
> > NULL in this case. I am not sure the reason for this. Does anyone have an
> > idea?
> > Second question is that when my test data is small, which is less than the
> > setting of (hive.exec.mode.local.auto.inputbytes.max, not sure I typed
> > correct here), the hive will run my query locally. But in this case, the
> > HIVE will fail due to my custom class (like InputFormat) not found error. Of
> > course in my session, I did the 'add jar xxxx.jar' command. If the test is
> > big, it will run in the standalone cluster without any problem (it finds the
> > class in my jar in this case). My question is this normal? Why hive running
> > in local mode won't be able to find my class in the jar which is already
> > being added?
> > My environment is CDH3U5, hive will be 0.7.1 in it.
> > Thanks
> > Yong