Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Loading Data to HDFS


Copy link to this message
-
Re: Loading Data to HDFS
It might sound like a deprecated way but can't you move the data physically?
>From what I understand, it is one shot and not "streaming" so it could be a
good method if you the access of course.

Regards

Bertrand

On Tue, Oct 30, 2012 at 11:07 AM, sumit ghosh <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have a data on remote machine accessible over ssh. I have Hadoop CDH4
> installed on RHEL. I am planning to load quite a few Petabytes of Data onto
> HDFS.
>
> Which will be the fastest method to use and are there any projects around
> Hadoop which can be used as well?
>
>
> I cannot install Hadoop-Client on the remote machine.
>
> Have a great Day Ahead!
> Sumit.
>
>
> ---------------
> Here I am attaching my previous discussion on CDH-user to avoid
> duplication.
> ---------------
> On Wed, Oct 24, 2012 at 9:29 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>
> wrote:
> in addition to jarcec's suggestions, you could use httpfs. then you'd only
> need to poke a single host:port in your firewall as all the traffic goes
> thru it.
> thx
> Alejandro
>
> On Oct 24, 2012, at 8:28 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>
> wrote:
> > Hi Sumit,
> > there is plenty of ways how to achieve that. Please find my feedback
> below:
> >
> >> Does Sqoop support loading flat files to HDFS?
> >
> > No, sqoop is supporting only data move from external database and
> warehouse systems. Copying files is not supported at the moment.
> >
> >> Can use distcp?
> >
> > No. Distcp can be used only to copy data between HDFS filesystesm.
> >
> >> How do we use the core-site.xml file on the remote machine to use
> >> copyFromLocal?
> >
> > Yes you can install hadoop binaries on your machine (with no hadoop
> running services) and use hadoop binary to upload data. Installation
> procedure is described in CDH4 installation guide [1] (follow "client"
> installation).
> >
> > Another way that I can think of is leveraging WebHDFS [2] or maybe
> hdfs-fuse [3]?
> >
> > Jarcec
> >
> > Links:
> > 1: https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation
> > 2:
> https://ccp.cloudera.com/display/CDH4DOC/Deploying+HDFS+on+a+Cluster#DeployingHDFSonaCluster-EnablingWebHDFS
> > 3: https://ccp.cloudera.com/display/CDH4DOC/Mountable+HDFS
> >
> > On Wed, Oct 24, 2012 at 01:33:29AM -0700, Sumit Ghosh wrote:
> >>
> >>
> >> Hi,
> >>
> >> I have a data on remote machine accessible over ssh. What is the fastest
> >> way to load data onto HDFS?
> >>
> >> Does Sqoop support loading flat files to HDFS?
> >> Can use distcp?
> >> How do we use the core-site.xml file on the remote machine to use
> >> copyFromLocal?
> >>
> >> Which will be the best to use and are there any other open source
> projects
> >> around Hadoop which can be used as well?
> >> Have a great Day Ahead!
> >> Sumit
--
Bertrand Dechoux