Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to mapreduce in the scenario


Copy link to this message
-
Re: How to mapreduce in the scenario
Yes it is possible by using MultipleInputs format to multiple mapper
(basically 2 different mapper)

Setp: 1

MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class,
*Mapper1.class*);
 MultipleInputs.addInputPath(conf, new Path(args[1]),
TextInputFormat.class, *Mapper2.class*);

while defining two mappers value  put some identifier
(*output.collect(new Text(key), new Text(*identifier+"~" *+value));*)
related to a.txt and b.txt so that it will easy to distinct two file mapper
output within the reducer.
Step 2:
  put b.txt in the distcach and compare the reducer value against the
b.txt  List
            String currValue = values.next().toString();
            String valueSplitted[] = currValue.split("~");
           if(valueSplitted[0].equals("A")) // "A":- Identifier from A
mapper
            {
               //where process A file
            }
            else if(valueSplitted[0].equals("B")) //"B":- Identifier from
B mapper
            {
                       //here process B file
            }

           output.collect(new Text(key), new Text("Formated Value as like
you to display"));

Decide the key  as like what you want to produce the result.

After that you have to use one reducer to perform the ouput.

thanks
samir

On Tue, May 29, 2012 at 3:45 PM, liuzhg <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I wonder that if Hadoop can solve effectively the question as following:
>
> =========================================> input file: a.txt, b.txt
> result: c.txt
>
> a.txt:
> id1,name1,age1,...
> id2,name2,age2,...
> id3,name3,age3,...
> id4,name4,age4,...
>
> b.txt:
> id1,address1,...
> id2,address2,...
> id3,address3,...
>
> c.txt
> id1,name1,age1,address1,...
> id2,name2,age2,address2,...
> =======================================>
> I know that it can be done well by database.
> But I want to handle it with hadoop if possible.
> Can hadoop meet the requirement?
>
> Any suggestion can help me. Thank you very much!
>
> Best Regards,
>
> Gump
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB