Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - How to mapreduce in the scenario


Copy link to this message
-
RE: How to mapreduce in the scenario
Devaraj k 2012-05-29, 10:40
Hi Gump,

   Mapreduce fits well for solving these types(joins) of problem.

I hope this will help you to solve the described problem..

1. Mapoutput key and value classes : Write a map out put key class(Text.class), value class(CombinedValue.class). Here value class should be able to hold the values from both the files(a.txt and b.txt) as shown below.

class CombinedValue implements WritableComparator
{
   String name;
   int age;
   String address;
   boolean isLeft; // flag to identify from which file
}

2. Mapper : Write a map() function which can parse from both the files(a.txt, b.txt) and produces common output key and value class.

3. Partitioner : Write the partitioner in such a way that it will Send all the (key, value) pairs to same reducer which are having same key.

4. Reducer : In the reduce() function, you will receive the records from both the files and you can combine those easily.
Thanks
Devaraj
________________________________________
From: liuzhg [[EMAIL PROTECTED]]
Sent: Tuesday, May 29, 2012 3:45 PM
To: [EMAIL PROTECTED]
Subject: How to mapreduce in the scenario

Hi,

I wonder that if Hadoop can solve effectively the question as following:

=========================================input file: a.txt, b.txt
result: c.txt

a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...

b.txt:
id1,address1,...
id2,address2,...
id3,address3,...

c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
=======================================
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?

Any suggestion can help me. Thank you very much!

Best Regards,

Gump