Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> MultipleInputs.addInputPath


+
jamal sasha 2013-11-21, 18:10
Copy link to this message
-
Re: MultipleInputs.addInputPath
Can not you specify such a file to process as Path in
MultipleInputs.addInputPath?

1) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-00000"),
TextInputFormat.class, Data1.class)
or
2) MultipleInputs.addInputPath(job, new Path(args[0] +
"/part-0000{1-2,5,8-9}"), TextInputFormat.class, Data1.class) // I have not
tested that, but I guess that it should work.
or
3) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-0000*"),
TextInputFormat.class, Data1.class) // I have not tested that, but I guess
that it should work.
or
4)
        String[] paths = {"path1", "pathA", "path-to-process"};
        for (String path: paths) {
             MultipleInputs.addInputPath(job, new Path(path),
TextInputFormat.class, Data1.class);
        }

2013/11/21 jamal sasha <[EMAIL PROTECTED]>

> Hi,
>
>   So, I have two different directories.. which i want to process
> differently...
> For which I have to mappers for the job..
>
> Data1
> Data2
>
> and in my driver.. I add the following:
> MultipleInputs.addInputPath(job, new Path( args[0]),
>      TextInputFormat.class,
>      Data1.class);
>
>
>     MultipleInputs.addInputPath(job, new Path(args[1]),
>      TextInputFormat.class,
>      Data2.class);
>
>
> But what I now want is to just select two files from it..
>
> So.. usually this is how we would do this
> FileInputFormat.addInputPaths(job,"Data1/part-00000,Data1/part-00000");
>
> But.. how do i specify specific files in MultiInputs object.
>
> Basically.. two mappers.. processing two different inputs... but I want to
> specify which files in thsoe two directories to read for processing by
> mappers.?
> How do i do this in hadoop?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB