Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> FileSystem Error

Copy link to this message
FileSystem Error

I am running a small java program that basically write a small input data
to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and
then output the content of the data.

In my hadoop.properties I have included the core-site.xml definition for
the Java program to connect to my single node setup so that I will not use
the Java Project file system but hadoop instead (Basically all write and
read are done on hadoop and not in the class file).

When I run the program, as soon as the Canopy (even the KMeans),
configuration tries to lookup for the file in the class path instead of the
Hadoop FileSystem path where the proper files are located.

Is there a problem with the way I have my conf defined?



public class DataFileWriter {

    private static Properties props = new Properties();
    private static Configuration conf = new Configuration();

     * @param args
     * @throws ClassNotFoundException
     * @throws InterruptedException
     * @throws IOException
    public static void main(String[] args) throws IOException,
            InterruptedException, ClassNotFoundException {

        props.load(new FileReader(new File(

        // TODO Auto-generated method stub
        FileSystem fs = null;
        SequenceFile.Writer writer;
        SequenceFile.Reader reader;

        conf.set("fs.default.name", props.getProperty("fs.default.name"));

        List<NamedVector> vectors = new LinkedList<NamedVector>();
        NamedVector v1 = new NamedVector(new DenseVector(new double[] { 0.1,
                0.2, 0.5 }), "Hello");
        v1 = new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2
        v1 = new NamedVector(new DenseVector(new double[] { 0.2, 0.5, 0.1
        // Write the data to SequenceFile
        try {
            fs = FileSystem.get(conf);

            Path path = new Path("testdata_seq/data");
            writer = new SequenceFile.Writer(fs, conf, path, Text.class,

            VectorWritable vec = new VectorWritable();
            for (NamedVector vector : vectors) {
                writer.append(new Text(vector.getName()), vec);

        } catch (Exception e) {
            System.out.println("ERROR: " + e);

        Path input = new Path("testdata_seq/data");
        boolean runSequential = false;
        Path clustersOut = new Path("testdata_seq/clusters");
        Path clustersIn = new
        double convergenceDelta = 0;
        double clusterClassificationThreshold = 0;
        boolean runClustering = true;
        Path output = new Path("testdata_seq/output");
        int maxIterations = 12;
        CanopyDriver.run(conf, input, clustersOut, new
EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runClustering,
clusterClassificationThreshold, runSequential);
        KMeansDriver.run(conf, input, clustersIn, output, new
EuclideanDistanceMeasure(), convergenceDelta, maxIterations, runClustering,
clusterClassificationThreshold, runSequential);

        reader = new SequenceFile.Reader(fs,
                new Path("testdata_seq/clusteredPoints/part-m-00000"),

        IntWritable key = new IntWritable();
        WeightedVectorWritable value = new WeightedVectorWritable();
        while (reader.next(key, value)) {
          System.out.println(value.toString() + " belongs to cluster "
                             + key.toString());


Error Output:

13/03/29 11:47:15 ERROR security.UserGroupInformation:
PriviledgedActionException as:cyril
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at DataFileWriter.main(DataFileWriter.java:85)
On another note. Is there a command that would allow the program to
overwrite existing files in the filesystem (I would get errors if I don't
delete the files before running the program again).

Azuryy Yu 2013-03-29, 23:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB