Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to mapreduce in the scenario


Copy link to this message
-
RE: How to mapreduce in the scenario
Hi Gump,

   Mapreduce fits well for solving these types(joins) of problem.

I hope this will help you to solve the described problem..

1. Mapoutput key and value classes : Write a map out put key class(Text.class), value class(CombinedValue.class). Here value class should be able to hold the values from both the files(a.txt and b.txt) as shown below.

class CombinedValue implements WritableComparator
{
   String name;
   int age;
   String address;
   boolean isLeft; // flag to identify from which file
}

2. Mapper : Write a map() function which can parse from both the files(a.txt, b.txt) and produces common output key and value class.

3. Partitioner : Write the partitioner in such a way that it will Send all the (key, value) pairs to same reducer which are having same key.

4. Reducer : In the reduce() function, you will receive the records from both the files and you can combine those easily.
Thanks
Devaraj
________________________________________
From: liuzhg [[EMAIL PROTECTED]]
Sent: Tuesday, May 29, 2012 3:45 PM
To: [EMAIL PROTECTED]
Subject: How to mapreduce in the scenario

Hi,

I wonder that if Hadoop can solve effectively the question as following:

=========================================input file: a.txt, b.txt
result: c.txt

a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...

b.txt:
id1,address1,...
id2,address2,...
id3,address3,...

c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
=======================================
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?

Any suggestion can help me. Thank you very much!

Best Regards,

Gump
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB