Dhaval Shah 2012-10-09, 22:29
Cheolsoo Park 2012-10-09, 22:57
Dhaval Shah 2012-10-09, 23:02
Cheolsoo Park 2012-10-09, 23:15
Russell Jurney 2012-10-09, 23:34
Cheolsoo Park 2012-10-09, 23:50
Russell Jurney 2012-10-09, 23:56
Dhaval Shah 2012-10-10, 14:57
Cheolsoo Park 2012-10-10, 20:55
----- Original Message -----
From: Cheolsoo Park <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; Dhaval Shah <[EMAIL PROTECTED]>
Sent: Wednesday, 10 October 2012 4:55 PM
Subject: Re: Error with Pig (CDH4.0.0)
Thank you very much for sharing your analysis! Your explanation definitely
provides more insights. :-)
If you don't mind, I'd like to clarify a couple of things. I am just trying
to see if there's something to be fixed in terms of Pig packaging:
>> the root issue was that the pig libraries needed to be present in the
HADOOP_CLASSPATH and it needs to be specifically set in hadoop_env.sh..
I am not sure if missing dependency libraries is the root cause. Given the
following error in your call stack, I believe that you have a different
version of antlr in classpath.
---Dhaval -> Yes I did have Antlr 2.7.7 and 3.0.1 in the classpath from Mahout 0.6.. However, I still do have it in my classpath (and it appears earlier in the classpath than Pig) and it still works for me.. Not sure what the actual issue is but something to think about.. Also, I saw that pig jars have antlr bundled.. So technically those should take precedence anyways unless its being called in a different way..
> at org.apache.pig.parser.QueryParserStringStream.<init>
If you look at QueryParserStringStream.java, it extends ANTLRStringStream,
and Pig 0.9 is compiled against antlr 3.4. Now if antlr is missing, you
should get ClassNotFoundError not NoSuchFiledError. Only possible reason
that I can think of is that there is a different version of antlr in
classpath at run-time.
In fact, there was a similar discussion on this mailing list a while ago
(Please note that he is reporting the same call stack as yours):
Antlr is a common tool, so it's possible that it's installed by other tools
such as Hive.
>> I tried setting PIG_CLASSPATH and HADOOP_CLASSPATH to have those
libraries in the shell and then start pig but that did not help either..
Assuming that a wrong version of antlr is present in classpath, this makes
perfect sense because PIG_CLASSPATH adds libraries to the end of CLASPATH:
---Dhaval --> I think PIG_CLASSPATH libraries should be added to the front to take precedence since we are calling pig and expressing that we want to use pig and not anything else
# add user-specified CLASSPATH
if [ "$PIG_CLASSPATH" != "" ]; then
That is, as long as the wrong antlr is present before the correct one, the
wrong one be picked up at run-time.
>> 2. The error message is not super helpful.. If libraries are missing,
the pig shell/grunt should not open at all.. However, in my case, it did
start up and then the error message was in no ways intuitive or pointing to
the root issue..
I agree with you that the error message is not very intuitive here. But
errors caused by dependency libraries can be only caught when Pig makes
calls to methods of those libraries at run-time. Furthermore, if there are
two different versions of the same library in classpath, the root cause can
be more subtle.
The challenge is that a Hadoop distribution such as CDH bundles many
sub-projects, and they often depend on different versions of the same
libraries. We do our best to harmonize all the versions of dependency
libraries across the platform, but I admit that it is not always perfect.
---Dhaval --> True I understand that issue and I guess the only thing we could do is update the pig executable here to exclude the known conflicts which in my opinion is a reasonable thing to do
On Wed, Oct 10, 2012 at 7:57 AM, Dhaval Shah <[EMAIL PROTECTED]>wrote:
> Alright I was eventually able to get the issue resolved.. For everyone's
> benefit, the root issue was that the pig libraries needed to be present in
Cheolsoo Park 2012-10-09, 23:31
Cheolsoo Park 2012-10-09, 23:31