-Re: Connect to HDFS running on a different Hadoop-Version
alo alt 2012-01-25, 12:32
Insight is a IBM related product, based on an fork of hadoop I think. The mixing of totally different stacks make no sense. And will not work, I guess.
On Jan 25, 2012, at 1:12 PM, Harsh J wrote:
> Hello Romeo,
> On Wed, Jan 25, 2012 at 4:07 PM, Romeo Kienzler <[EMAIL PROTECTED]> wrote:
>> Dear List,
>> we're trying to use a central HDFS storage in order to be accessed from
>> various other Hadoop-Distributions.
> The HDFS you've setup, what 'distribution' is that from? You will have
> to use that particular version's jar across all client applications
> you use, else you'll run into RPC version incompatibilities.
>> Do you think this is possible? We're having trouble, but not related to
>> different RPC-Versions.
> It should be possible _most of the times_ by replacing jars at the
> client end to use the one that runs your cluster, but there may be
> minor API incompatibilities between certain versions that can get in
> the way. Purely depends on your client application and its
> implementation. If it sticks to using the publicly supported APIs, you
> are mostly fine.
>> When trying to access a Cloudera CDH3 Update 2 (cdh3u2) HDFS from
>> BigInsights 1.3 we're getting this error:
> BigInsights runs off IBM's own patched Hadoop sources if I am right,
> and things can get a bit tricky there. See the following points:
>> Bad connection to FS. Command aborted. Exception: Call to
>> localhost.localdomain/127.0.0.1:50070 failed on local exception:
>> java.io.IOException: Call to localhost.localdomain/127.0.0.1:50070 failed on
>> local exception: java.io.EOFException
> This is surely an RPC issue. The call tries to read off a field, but
> gets no response, EOFs and dies. We have more descriptive error
> messages with the 0.23 version onwards, but the problem here is that
> your IBM client jar is not the same as your cluster's jar. The mixture
> won't work.
> ^^ This is what am speaking of. Your client (BigInsights? Have not
> used it really…) is using an IBM jar with their supplied
> 'PatchDistributedFileSystem', and that is probably incompatible with
> the cluster's HDFS RPC protocols. I do not know enough about IBM's
> custom stuff to know for sure it would work if you replace it with
> your clusters' jar.
>> But we've already replaced the client hadoop-common.jar's with the Cloudera
> Apparently not. Your strace shows that com.ibm.* classes are still
> being pulled. My guess is that BigInsights would not work with
> anything non IBM, but I have not used it to know for sure.
> If they have a user community, you can ask there if there is a working
> way to have BigInsights run against Apache/CDH/etc. distributions.
> For CDH specific questions, you may ask at
> https://groups.google.com/a/cloudera.org/group/cdh-user/topics instead
> of the Apache lists here.
> Harsh J
> Customer Ops. Engineer, Cloudera