Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - HDFS Blockreport question

Copy link to this message
Re: HDFS Blockreport question
Brian Bockelman 2010-04-06, 14:50
Hey Jay,

I think, if you're experienced in implementing transfer protocols, it is not difficult to implement the HDFS wire protocol.  As you point out, they are subject to change between releases (especially between 0.20, 0.21, and 0.22) and basically documented in fragments in the java source code.  At least, I looked at doing this for the read portions, and it wasn't horrible.

However, the *really hard part* is the client retry/recovery logic.  That's where a lot of the intelligence is, in very large classes, and not incredibly well-documented.

I've had lots of luck with scaling libhdfs - we average >20TB / day and billions of I/O operations a day with it.  I'd strongly advise not re-inventing the wheel, unless it's for a research project.


On Apr 6, 2010, at 8:53 AM, Jay Booth wrote:

> A pure C library to communicate with HDFS?
> Certainly possible, but it would be a lot of work, and the HDFS wire
> protocols are ad hoc, only somewhat documented and subject to change between
> releases right now so you'd be chasing a moving target.  I'd try to think of
> another way to accomplish what you want to do before attempting a client
> reimplementation in C right now..  if you only need to talk to the namenode
> and not the datanodes it might be a little easier but still, lots of work
> that will probably be obsolete after another release or two.
> On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres <[EMAIL PROTECTED]>wrote:
>> Thanks!
>> I'm already using eclipse to browse the code.
>> In this scenario, i could understand that java serializes the object
>> through the network and its parameters.  is that ok?
>> For example, if i want to make a pure C library (with no JNI
>> interfaces).. is it possible/feasible? or it will be like to freeze
>> the hell?
>> Thanks once again!!!
>> On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
>>> If you look at the getProxy code it passes an "Invoker" (or something
>>> like that) which the proxy code uses to delegate calls TO.  The
>>> Invoker will call another class "Client" which has sub-classes like
>>> Call, and Connection which wrap the actual java IO.  This all lives in
>>> the org.apache.hadoop.ipc package.
>>> Be sure to use a good IDE like IJ or Eclipse to browse the code, it
>>> makes following all this stuff much easier.
>>> On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
>>> <[EMAIL PROTECTED]> wrote:
>>>> Hi again!
>>>> Anyone could help me?
>>>> I could not understand how RPC class works. For me, only tries to
>>>> instantiates a single interfaces with no declaration for some methods
>>>> like blockreport. But then it uses rpc.getproxy to get new class wich
>>>> send messages with name node.
>>>> I'm sorry for this silly question, but i am really lost at this point.
>>>> Thanks for the patience.
>>>> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
>>>> <[EMAIL PROTECTED]> wrote:
>>>>> Hi Jay!
>>>>> thanks for the answear but i'm asking for what it works it sends?
>>>>> blockreport is an interface in DatanodeProtocol that has no
>>>>> declaration.
>>>>> thanks!
>>>>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <[EMAIL PROTECTED]> wrote:
>>>>>> In DataNode:
>>>>>> public DatanodeProtocol namenode
>>>>>> It's not a reference to an actual namenode, it's a wrapper for a
>> network
>>>>>> protocol created by that RPC.waitForProxy call -- so when it calls
>>>>>> namenode.blockReport, it's sending that information over RPC to the
>> namenode
>>>>>> instance over the network
>>>>>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <
>>>>>>> Hi everyone!
>>>>>>> sailing throught the hdfs source code that comes with hadoop 0.20.2,
>> i
>>>>>>> could not understand how hdfs sends blockreport to nameNode.
>>>>>>> As i can see, in
>>>>>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we