Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - RES: I want to call HDFS REST api to upload a file using httplib.


Copy link to this message
-
Re: RES: I want to call HDFS REST api to upload a file using httplib.
Adam Faris 2013-04-10, 22:39
Creating a file on HDFS is a multi-step process. If you allow me to generalize and skip over a lot of details, it's essentially a two step process.    1) ask the namenode for a location to write the blocks.   2) connect to the datanode and write your data.   The output from your curl statement is the response from the namenode, which returns a 307 and a location.   Your client, (curl) is supposed to say hey I have a new location and connect to the data node to write the data.   If you add -L to your curl request, you'll see this happening.  

Just as a FYI, an example of using httplib for webhdfs is a solved problem.  You have your pick of languages on github that do this already.  :)  

https://github.com/search?q=webhdfs&type=Repositories&s=updated    

-- Adam

On Apr 9, 2013, at 8:32 AM, Daryn Sharp <[EMAIL PROTECTED]> wrote:

> Try adding -L to your curl and see if that works.
>
> Daryn
>
> On Apr 8, 2013, at 11:05 PM, 小学园PHP wrote:
>
>> Really Thanks.
>> But the returned URL is wrong. And the localhost is the real URL, as i tested successfully with curl using "localhost".
>> Can anybody help me translate the curl to Python httplib?
>> curl -i -X PUT -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE"
>> I test it using python httplib, and receive the right response. But the file uploaded to HDFS is empty, no data sent!!
>> Is "conn.send(data)"  the problem?
>>
>> ------------------ Original ------------------
>> From:  "MARCOS MEDRADO RUBINELLI"<[EMAIL PROTECTED]>;
>> Date:  Mon, Apr 8, 2013 04:22 PM
>> To:  "[EMAIL PROTECTED]"<[EMAIL PROTECTED]>;
>> Subject:  RES: I want to call HDFS REST api to upload a file using httplib.
>>
>> On your first call, Hadoop will return a URL pointing to a datanode in the Location header of the 307 response. On your second call, you have to use that URL instead of constructing your own. You can see the specific documentation here:
>> http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
>>
>> Regards,
>> Marcos
>>
>> I want to call HDFS REST api to upload a file using httplib.
>>
>> My program created the file, but no content is in it.
>>
>> ====================================================>>
>> Here is my code:
>>
>> import
>>  httplib
>>
>> conn
>> =httplib.HTTPConnection("localhost:50070")
>>
>> conn
>> .request("PUT","/webhdfs/v1/levi/4?op=CREATE")
>>
>> res
>> =conn.getresponse()
>> print res.status,res.
>> reason
>> conn
>> .close()
>>
>>
>> conn
>> =httplib.HTTPConnection("localhost:50075")
>>
>> conn
>> .connect()
>>
>> conn
>> .putrequest("PUT","/webhdfs/v1/levi/4?op=CREATE&user.name=levi")
>>
>> conn
>> .endheaders()
>>
>> a_file
>> =open("/home/levi/4","rb")
>>
>> a_file
>> .seek(0)
>>
>> data
>> =a_file.read()
>>
>> conn
>> .send(data)
>>
>> res
>> =conn.getresponse()
>> print res.status,res.
>> reason
>> conn
>> .close()
>> =================================================>>
>> Here is the return:
>>
>> 307 TEMPORARY_REDIRECT 201 Created
>>
>> ========================================================>>
>> OK, the file was created, but no content was sent.
>>
>> When I comment the #conn.send(data), the result is the same, still no content.
>>
>> Maybe the file read or the send is wrong, not sure.
>>
>> Do you know how this happened?
>>
>