Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Loading Data to HDFS


+
sumit ghosh 2012-10-30, 10:07
+
M. C. Srivas 2012-10-30, 14:24
+
Ranjith 2012-10-31, 04:42
+
Bertrand Dechoux 2012-10-30, 10:16
+
sumit ghosh 2012-10-30, 10:39
Copy link to this message
-
Re: Loading Data to HDFS
Bertrand Dechoux 2012-10-30, 11:10
I don't know what you mean by gateway but in order to have a rough idea of
the time needed you need 3 values
* amount of data you want to put on hadoop
* hadoop bandwidth with regards to local storage (read/write)
* bandwidth between where your data are stored and where the hadoop cluster
is

For the latter, for big volumes, physically moving the volumes is a viable
solution.
It will depends on your constraints of course : budget, speed...

Bertrand

On Tue, Oct 30, 2012 at 11:39 AM, sumit ghosh <[EMAIL PROTECTED]> wrote:

> Hi Bertrand,
>
> By Physically movi ng the data do you mean that the data volume is
> connected to the gateway machine and the data is loaded from the local copy
> using copyFromLocal?
>
> Thanks,
> Sumit
>
>
> ________________________________
> From: Bertrand Dechoux <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; sumit ghosh <[EMAIL PROTECTED]>
> Sent: Tuesday, 30 October 2012 3:46 PM
> Subject: Re: Loading Data to HDFS
>
> It might sound like a deprecated way but can't you move the data
> physically?
> From what I understand, it is one shot and not "streaming" so it could be a
> good method if you the access of course.
>
> Regards
>
> Bertrand
>
> On Tue, Oct 30, 2012 at 11:07 AM, sumit ghosh <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I have a data on remote machine accessible over ssh. I have Hadoop CDH4
> > installed on RHEL. I am planning to load quite a few Petabytes of Data
> onto
> > HDFS.
> >
> > Which will be the fastest method to use and are there any projects around
> > Hadoop which can be used as well?
> >
> >
> > I cannot install Hadoop-Client on the remote machine.
> >
> > Have a great Day Ahead!
> > Sumit.
> >
> >
> > ---------------
> > Here I am attaching my previous discussion on CDH-user to avoid
> > duplication.
> > ---------------
> > On Wed, Oct 24, 2012 at 9:29 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>
> > wrote:
> > in addition to jarcec's suggestions, you could use httpfs. then you'd
> only
> > need to poke a single host:port in your firewall as all the traffic goes
> > thru it.
> > thx
> > Alejandro
> >
> > On Oct 24, 2012, at 8:28 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>
> > wrote:
> > > Hi Sumit,
> > > there is plenty of ways how to achieve that. Please find my feedback
> > below:
> > >
> > >> Does Sqoop support loading flat files to HDFS?
> > >
> > > No, sqoop is supporting only data move from external database and
> > warehouse systems. Copying files is not supported at the moment.
> > >
> > >> Can use distcp?
> > >
> > > No. Distcp can be used only to copy data between HDFS filesystesm.
> > >
> > >> How do we use the core-site.xml file on the remote machine to use
> > >> copyFromLocal?
> > >
> > > Yes you can install hadoop binaries on your machine (with no hadoop
> > running services) and use hadoop binary to upload data. Installation
> > procedure is described in CDH4 installation guide [1] (follow "client"
> > installation).
> > >
> > > Another way that I can think of is leveraging WebHDFS [2] or maybe
> > hdfs-fuse [3]?
> > >
> > > Jarcec
> > >
> > > Links:
> > > 1: https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation
> > > 2:
> >
> https://ccp.cloudera.com/display/CDH4DOC/Deploying+HDFS+on+a+Cluster#DeployingHDFSonaCluster-EnablingWebHDFS
> > > 3: https://ccp.cloudera.com/display/CDH4DOC/Mountable+HDFS
> > >
> > > On Wed, Oct 24, 2012 at 01:33:29AM -0700, Sumit Ghosh wrote:
> > >>
> > >>
> > >> Hi,
> > >>
> > >> I have a data on remote machine accessible over ssh. What is the
> fastest
> > >> way to load data onto HDFS?
> > >>
> > >> Does Sqoop support loading flat files to HDFS?
> > >> Can use distcp?
> > >> How do we use the core-site.xml file on the remote machine to use
> > >> copyFromLocal?
> > >>
> > >> Which will be the best to use and are there any other open source
> > projects
> > >> around Hadoop which can be used as well?
> > >> Have a great Day Ahead!
> > >> Sumit
>
>
>
>
> --
> Bertrand Dechoux
>

--
Bertrand Dechoux
+
sumit ghosh 2012-10-30, 13:25
+
Alejandro Abdelnur 2012-10-30, 13:12