Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> can one map instance handle many data of input paths at the same time


Copy link to this message
-
Re: can one map instance handle many data of input paths at the same time
Thanks everyone,

I detailed describe my question.  There are two input
direcoties:/user/test1/ and /user/test2/ path, I want to join the two
direcoties content, in order to join the two directories, I need to identity
the content from which directory, so I use below code in mapper:

    private int tag = -1;
    @Override
    public void configure(JobConf conf) {
        try {

            this.conf = conf;
            String pathsToAliasStr = conf.get("paths.to.alias");//example:
conf.set("paths.to.alias", "0=/user/test1/,1=/user/test2/"
            String[] pathsToAlias = pathsToAliasStr.split(",");

            Path fpath = new Path((new
Path(conf.get("map.input.file"))).toUri().getPath());
            String path = fpath.toUri().toString();

            for (int i = 0; i < pathsToAlias.length; i++) {
                String[] pathToAlias = pathsToAlias[i].split("=");
                if (path.startsWith(pathToAlias[1])) {
                    tag = Integer.valueOf(pathToAlias[0].trim());//identity
current map instatnce are handling which directory content.
                }
            }
        } catch (Throwable e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }

    }

So when map method  run,  the content are handled by the mapper are
identified for same direcoty.

I want to know whether one mapper instatnce only handle content of one
directory at same time.
Thanks

LiuLei
2011/1/21 Eric Sammer <[EMAIL PROTECTED]>

> LiuLei:
>
> Yes. What you're looking for is TextInputFormat.addPath() (assuming you're
> talking about text). You can call this multiple times and add multiple
> input
> paths if they are all of the same data format (i.e. text). If you have
> multiple paths that contain different format data, you'll need to use
> MultipleInputs. See the javadoc for details on usage.
>
> On Thu, Jan 20, 2011 at 1:52 AM, lei liu <[EMAIL PROTECTED]> wrote:
>
> > There are two input paths, example: /user/test1/ and /user/test2/ path.
> >  Can
> > one map instance handle many data of input paths at the same time?
> >
> >
> > Thanks,
> >
> > LiuLei
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB