Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Copy files from remote folder to HDFS


Copy link to this message
-
Re: Copy files from remote folder to HDFS
Hi Panshul,

             I am also working on similar requirement, one approach is,
mount your remote folder on your hadoop master node.
             And simply write a shell script to copy the files to HDFS
using crontab.

             I believe Flume is literally a wrong choice as Flume is  a
data collection and aggregation framework and NOT a file transfer tool and
may NOT be a good choice when you actually want to copy the files as-is
onto your cluster (NOT 100% sure as I am also working on that).

Thanks,
Mahesh Balija,
CalsoftLabs.

On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper <[EMAIL PROTECTED]>wrote:

> Hello,
>
> I am trying to copy files, Json files from a remote folder - (a folder on
> my local system, Cloudfiles folder or a folder on S3 server) to the HDFS of
> a cluster running at a remote location.
> The job submitting Application is based on Spring Hadoop.
>
> Can someone please suggest or point me in the right direction for best
> option to achieve the above task:
> 1. Use Spring Integration data pipelines to poll the folders for files and
> copy them to the HDFS as they arrive in the source folder. - I have tried
> to implement the solution in Spring Data book, but it does not run - no
> idea what is wrong as it does not generate logs.
>
> 2. Use some other script method to transfer files.
>
> Main requirement, I need to transfer files from a remote folder to HDFS
> everyday at a fixed time for processing in the hadoop cluster. These files
> are collecting from various sources in the remote folders.
>
> Please suggest an efficient approach. I have been searching and finding a
> lot of approaches but unable to decide what will work best. As this
> transfer needs to be as fast as possible.
> The files to be transferred will be almost 10 GB of Json files not more
> than 6kb each file.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>
+
Nitin Pawar 2013-01-25, 07:44
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB