-Re: Connect to HDFS running on a different Hadoop-Version
Michael Segel 2012-01-25, 13:19
I said I would be nice and hold my tongue when it comes to IBM and their IM pillar products... :-)
You could write a client that talks to two different hadoop versions but then you would be using hftp which is what you have under the hood in distcp...
But that doesn't seem to be what he wants to do... I can only imagine why he is asking this question... ;-)
Sent from my iPhone
On Jan 25, 2012, at 7:32 AM, "alo alt" <[EMAIL PROTECTED]> wrote:
> Insight is a IBM related product, based on an fork of hadoop I think. The mixing of totally different stacks make no sense. And will not work, I guess.
> - Alex
> Alexander Lorenz
> On Jan 25, 2012, at 1:12 PM, Harsh J wrote:
>> Hello Romeo,
>> On Wed, Jan 25, 2012 at 4:07 PM, Romeo Kienzler <[EMAIL PROTECTED]> wrote:
>>> Dear List,
>>> we're trying to use a central HDFS storage in order to be accessed from
>>> various other Hadoop-Distributions.
>> The HDFS you've setup, what 'distribution' is that from? You will have
>> to use that particular version's jar across all client applications
>> you use, else you'll run into RPC version incompatibilities.
>>> Do you think this is possible? We're having trouble, but not related to
>>> different RPC-Versions.
>> It should be possible _most of the times_ by replacing jars at the
>> client end to use the one that runs your cluster, but there may be
>> minor API incompatibilities between certain versions that can get in
>> the way. Purely depends on your client application and its
>> implementation. If it sticks to using the publicly supported APIs, you
>> are mostly fine.
>>> When trying to access a Cloudera CDH3 Update 2 (cdh3u2) HDFS from
>>> BigInsights 1.3 we're getting this error:
>> BigInsights runs off IBM's own patched Hadoop sources if I am right,
>> and things can get a bit tricky there. See the following points:
>>> Bad connection to FS. Command aborted. Exception: Call to
>>> localhost.localdomain/127.0.0.1:50070 failed on local exception:
>>> java.io.IOException: Call to localhost.localdomain/127.0.0.1:50070 failed on
>>> local exception: java.io.EOFException
>> This is surely an RPC issue. The call tries to read off a field, but
>> gets no response, EOFs and dies. We have more descriptive error
>> messages with the 0.23 version onwards, but the problem here is that
>> your IBM client jar is not the same as your cluster's jar. The mixture
>> won't work.
>> ^^ This is what am speaking of. Your client (BigInsights? Have not
>> used it really…) is using an IBM jar with their supplied
>> 'PatchDistributedFileSystem', and that is probably incompatible with
>> the cluster's HDFS RPC protocols. I do not know enough about IBM's
>> custom stuff to know for sure it would work if you replace it with
>> your clusters' jar.
>>> But we've already replaced the client hadoop-common.jar's with the Cloudera
>> Apparently not. Your strace shows that com.ibm.* classes are still
>> being pulled. My guess is that BigInsights would not work with
>> anything non IBM, but I have not used it to know for sure.
>> If they have a user community, you can ask there if there is a working
>> way to have BigInsights run against Apache/CDH/etc. distributions.
>> For CDH specific questions, you may ask at
>> https://groups.google.com/a/cloudera.org/group/cdh-user/topics instead
>> of the Apache lists here.
>> Harsh J
>> Customer Ops. Engineer, Cloudera