Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - MultipleInputs.addInputPath


Copy link to this message
-
Re: MultipleInputs.addInputPath
Adam Kawa 2013-11-30, 23:22
Can not you specify such a file to process as Path in
MultipleInputs.addInputPath?

1) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-00000"),
TextInputFormat.class, Data1.class)
or
2) MultipleInputs.addInputPath(job, new Path(args[0] +
"/part-0000{1-2,5,8-9}"), TextInputFormat.class, Data1.class) // I have not
tested that, but I guess that it should work.
or
3) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-0000*"),
TextInputFormat.class, Data1.class) // I have not tested that, but I guess
that it should work.
or
4)
        String[] paths = {"path1", "pathA", "path-to-process"};
        for (String path: paths) {
             MultipleInputs.addInputPath(job, new Path(path),
TextInputFormat.class, Data1.class);
        }

2013/11/21 jamal sasha <[EMAIL PROTECTED]>

> Hi,
>
>   So, I have two different directories.. which i want to process
> differently...
> For which I have to mappers for the job..
>
> Data1
> Data2
>
> and in my driver.. I add the following:
> MultipleInputs.addInputPath(job, new Path( args[0]),
>      TextInputFormat.class,
>      Data1.class);
>
>
>     MultipleInputs.addInputPath(job, new Path(args[1]),
>      TextInputFormat.class,
>      Data2.class);
>
>
> But what I now want is to just select two files from it..
>
> So.. usually this is how we would do this
> FileInputFormat.addInputPaths(job,"Data1/part-00000,Data1/part-00000");
>
> But.. how do i specify specific files in MultiInputs object.
>
> Basically.. two mappers.. processing two different inputs... but I want to
> specify which files in thsoe two directories to read for processing by
> mappers.?
> How do i do this in hadoop?
>