Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - can one map instance handle many data of input paths at the same time


Copy link to this message
-
Re: can one map instance handle many data of input paths at the same time
lei liu 2011-01-21, 05:57
Thanks everyone,

I detailed describe my question.  There are two input
direcoties:/user/test1/ and /user/test2/ path, I want to join the two
direcoties content, in order to join the two directories, I need to identity
the content from which directory, so I use below code in mapper:

    private int tag = -1;
    @Override
    public void configure(JobConf conf) {
        try {

            this.conf = conf;
            String pathsToAliasStr = conf.get("paths.to.alias");//example:
conf.set("paths.to.alias", "0=/user/test1/,1=/user/test2/"
            String[] pathsToAlias = pathsToAliasStr.split(",");

            Path fpath = new Path((new
Path(conf.get("map.input.file"))).toUri().getPath());
            String path = fpath.toUri().toString();

            for (int i = 0; i < pathsToAlias.length; i++) {
                String[] pathToAlias = pathsToAlias[i].split("=");
                if (path.startsWith(pathToAlias[1])) {
                    tag = Integer.valueOf(pathToAlias[0].trim());//identity
current map instatnce are handling which directory content.
                }
            }
        } catch (Throwable e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }

    }

So when map method  run,  the content are handled by the mapper are
identified for same direcoty.

I want to know whether one mapper instatnce only handle content of one
directory at same time.
Thanks

LiuLei
2011/1/21 Eric Sammer <[EMAIL PROTECTED]>

> LiuLei:
>
> Yes. What you're looking for is TextInputFormat.addPath() (assuming you're
> talking about text). You can call this multiple times and add multiple
> input
> paths if they are all of the same data format (i.e. text). If you have
> multiple paths that contain different format data, you'll need to use
> MultipleInputs. See the javadoc for details on usage.
>
> On Thu, Jan 20, 2011 at 1:52 AM, lei liu <[EMAIL PROTECTED]> wrote:
>
> > There are two input paths, example: /user/test1/ and /user/test2/ path.
> >  Can
> > one map instance handle many data of input paths at the same time?
> >
> >
> > Thanks,
> >
> > LiuLei
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>