Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Importing Data to HDFS


Copy link to this message
-
Re: Importing Data to HDFS
Hey Urckle,

I'm biased, but I'd recommend checking out Sqoop (
http://github.com/cloudera/sqoop) for moving data from RDBMS systems into
HDFS/Hive/HBase and Flume (http://github.com/cloudera/flume) for moving log
files into HDFS/Hive/HBase.

For moving large sets of files into HDFS, I think distcp (
http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/distcp.html) is your
best bet.

Thanks,
Jeff

On Fri, Jul 16, 2010 at 4:51 AM, Urckle <[EMAIL PROTECTED]> wrote:

> Scenario:
> Hadoop version: 0.20.2
> MR coding will be done in java.
>
>
> Just starting out with my first Hadoop setup. I would like to know are
> there any best practice ways to load data into the dfs? I have (obviously)
> manually put data files into hdfs using the shell commands while playing
> with it at setup but going forward I will want to be retrieving large
> numbers of data feeds from remote, 3rd party locations and throwing them
> into hadoop for analysis later. What is the best way to automate this? Is it
> to gather the retrieved files into known locations to be mounted and then
> automate via script etc. to put the files into hdfs? Or is there some other
> practice? I've not been able to find specific use case yet... all docs cover
> the basic fs command without giving much details about more advanced setups.
>
> thanks for any info
>
> regards
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB