|
|
+
Urckle 2010-07-16, 11:51
-
Re: Importing Data to HDFSJeff Hammerbacher 2010-07-20, 07:27
Hey Urckle,
I'm biased, but I'd recommend checking out Sqoop ( http://github.com/cloudera/sqoop) for moving data from RDBMS systems into HDFS/Hive/HBase and Flume (http://github.com/cloudera/flume) for moving log files into HDFS/Hive/HBase. For moving large sets of files into HDFS, I think distcp ( http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/distcp.html) is your best bet. Thanks, Jeff On Fri, Jul 16, 2010 at 4:51 AM, Urckle <[EMAIL PROTECTED]> wrote: > Scenario: > Hadoop version: 0.20.2 > MR coding will be done in java. > > > Just starting out with my first Hadoop setup. I would like to know are > there any best practice ways to load data into the dfs? I have (obviously) > manually put data files into hdfs using the shell commands while playing > with it at setup but going forward I will want to be retrieving large > numbers of data feeds from remote, 3rd party locations and throwing them > into hadoop for analysis later. What is the best way to automate this? Is it > to gather the retrieved files into known locations to be mounted and then > automate via script etc. to put the files into hdfs? Or is there some other > practice? I've not been able to find specific use case yet... all docs cover > the basic fs command without giving much details about more advanced setups. > > thanks for any info > > regards > |