Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Automatically upload files into HDFS


Copy link to this message
-
Re: Automatically upload files into HDFS
If it is just copying the files without any processing or change, you can
use something like this :

public class CopyData {

    public static void main(String[] args) throws IOException{

        Configuration configuration = new Configuration();
        configuration.addResource(new
Path("/home/mohammad/hadoop-0.20.205/conf/core-site.xml"));
        configuration.addResource(new
Path("/home/mohammad/hadoop-0.20.205/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(configuration);
        Path inputFile = new Path("/home/mohammad/pc/work/FFT.java");
        Path outputFile = new Path("/mapout/FFT.java");
        fs.copyFromLocalFile(inputFile, outputFile);
        fs.close();
    }
}

Obviously you have to modify it as per your requirements like continuously
polling the targeted directory for new files.

Regards,
    Mohammad Tariq

On Mon, Nov 19, 2012 at 6:23 PM, kashif khan <[EMAIL PROTECTED]> wrote:

> Thanks M  Tariq
>
> As I am new in  Java and Hadoop and have no much experience. I am trying
> to first write a simple program to upload data into HDFS and gradually move
> forward. I have written the following simple program to upload the file
> into HDFS, I dont know why it does not working.  could you please check it,
> if have time.
>
> import java.io.BufferedInputStream;
> import java.io.BufferedOutputStream;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.OutputStream;
> import java.nio.*;
> //import java.nio.file.Path;
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FSDataInputStream;
> import org.apache.hadoop.fs.FSDataOutputStream;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> public class hdfsdata {
>
>
> public static void main(String [] args) throws IOException
> {
>     try{
>
>
>     Configuration conf = new Configuration();
>     conf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
>     conf.addResource(new Path ("/etc/hadoop/conf/hdfs-site.xml"));
>     FileSystem fileSystem = FileSystem.get(conf);
>     String source = "/usr/Eclipse/Output.csv";
>     String dest = "/user/hduser/input/";
>
>     //String fileName = source.substring(source.lastIndexOf('/') +
> source.length());
>     String fileName = "Output1.csv";
>
>     if (dest.charAt(dest.length() -1) != '/')
>     {
>         dest = dest + "/" +fileName;
>     }
>     else
>     {
>         dest = dest + fileName;
>
>     }
>     Path path = new Path(dest);
>
>
>     if(fileSystem.exists(path))
>     {
>         System.out.println("File" + dest + " already exists");
>     }
>
>
>    FSDataOutputStream out = fileSystem.create(path);
>    InputStream in = new BufferedInputStream(new FileInputStream(new
> File(source)));
>    File myfile = new File(source);
>    byte [] b = new byte [(int) myfile.length() ];
>    int numbytes = 0;
>    while((numbytes = in.read(b)) >= 0)
>
>    {
>        out.write(b,0,numbytes);
>    }
>    in.close();
>    out.close();
>    //bos.close();
>    fileSystem.close();
>     }
>     catch(Exception e)
>     {
>
>         System.out.println(e.toString());
>     }
>     }
>
> }
>
>
> Thanks again,
>
> Best regards,
>
> KK
>
>
>
> On Mon, Nov 19, 2012 at 12:41 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> You can set your cronjob to execute the program after every 5 sec.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Nov 19, 2012 at 6:05 PM, kashif khan <[EMAIL PROTECTED]>wrote:
>>
>>> Well, I want to automatically upload the files as  the files are
>>> generating about every 3-5 sec and each file has size about 3MB.
>>>
>>>  Is it possible to automate the system using put or cp command?
>>>
>>> I read about the flume and webHDFS but I am not sure it will work or not.
>>>
>>> Many thanks
>>>
>>> Best regards
>>>
>>>
>>>
>>>
>>> On Mon, Nov 19, 2012 at 12:26 PM, Alexander Alten-Lorenz <
>>> [EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB