Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> RE: Fastest way to transfer files


Copy link to this message
-
RE: Fastest way to transfer files
Here’s an example of running distcp (actually in this case s3distcp, but it’s about the same, just new DistCp()) from java:

 

ToolRunner.run(getConf(), new S3DistCp(), new String[] {

       "--src",             "/src/dir/",

       "--srcPattern",      ".*(itemtable)-r-[0-9]*.*",

       "--dest",            "s3://yourbucket/results/",

       "--s3Endpoint",      "s3.amazonaws.com"         });

 

 

 

From: Joep Rottinghuis [mailto:[EMAIL PROTECTED]]
Sent: Saturday, December 29, 2012 2:51 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: Fastest way to transfer files

 

Not sure why you are implying a contradiction when you say: "... distcp is useful _but_ you want to do 'it' in java..."

 

First of all distcp _is_ written in Java.

You can call distcp or any other MR job from Java just fine.

 

Cheers,

 

Joep

Sent from my iPhone
On Dec 28, 2012, at 12:01 PM, burakkk <[EMAIL PROTECTED]> wrote:

Hi,

I have two different hdfs cluster. I need to transfer files between these environments. What's the fastest way to transfer files for that situation?

 

I've researched about it. I found distcp command. It's useful but I want to do in java so is there any way to do this?

 

Is there any way to transfer files chunk by chunk from one hdfs cluster to another one or is there any way to implement a process using chunks without whole file?

 

Thanks

Best Regards...

 

--

BURAK ISIKLI | http://burakisikli.wordpress.com

 

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB