Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Loading Data to HDFS


Copy link to this message
-
Loading Data to HDFS
sumit ghosh 2012-10-30, 10:07
Hi,

I have a data on remote machine accessible over ssh. I have Hadoop CDH4 installed on RHEL. I am planning to load quite a few Petabytes of Data onto HDFS.
 
Which will be the fastest method to use and are there any projects around Hadoop which can be used as well?

 
I cannot install Hadoop-Client on the remote machine.
 
Have a great Day Ahead!
Sumit.
 
 
---------------
Here I am attaching my previous discussion on CDH-user to avoid duplication.
---------------
On Wed, Oct 24, 2012 at 9:29 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
in addition to jarcec's suggestions, you could use httpfs. then you'd only need to poke a single host:port in your firewall as all the traffic goes thru it.
thx
Alejandro

On Oct 24, 2012, at 8:28 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]> wrote:
> Hi Sumit,
> there is plenty of ways how to achieve that. Please find my feedback below:
>
>> Does Sqoop support loading flat files to HDFS?
>
> No, sqoop is supporting only data move from external database and warehouse systems. Copying files is not supported at the moment.
>
>> Can use distcp?
>
> No. Distcp can be used only to copy data between HDFS filesystesm.
>
>> How do we use the core-site.xml file on the remote machine to use
>> copyFromLocal?
>
> Yes you can install hadoop binaries on your machine (with no hadoop running services) and use hadoop binary to upload data. Installation procedure is described in CDH4 installation guide [1] (follow "client" installation).
>
> Another way that I can think of is leveraging WebHDFS [2] or maybe hdfs-fuse [3]?
>
> Jarcec
>
> Links:
> 1: https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation
> 2: https://ccp.cloudera.com/display/CDH4DOC/Deploying+HDFS+on+a+Cluster#DeployingHDFSonaCluster-EnablingWebHDFS
> 3: https://ccp.cloudera.com/display/CDH4DOC/Mountable+HDFS
>
> On Wed, Oct 24, 2012 at 01:33:29AM -0700, Sumit Ghosh wrote:
>>
>>
>> Hi,
>>
>> I have a data on remote machine accessible over ssh. What is the fastest
>> way to load data onto HDFS?
>>
>> Does Sqoop support loading flat files to HDFS?
>> Can use distcp?
>> How do we use the core-site.xml file on the remote machine to use
>> copyFromLocal?
>>
>> Which will be the best to use and are there any other open source projects
>> around Hadoop which can be used as well?
>> Have a great Day Ahead!
>> Sumit
+
M. C. Srivas 2012-10-30, 14:24
+
Ranjith 2012-10-31, 04:42
+
Bertrand Dechoux 2012-10-30, 10:16
+
sumit ghosh 2012-10-30, 10:39
+
Bertrand Dechoux 2012-10-30, 11:10
+
sumit ghosh 2012-10-30, 13:25
+
Alejandro Abdelnur 2012-10-30, 13:12