Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> How to mapreduce in the scenario


+
lzg 2012-05-29, 09:08
Copy link to this message
-
Re: How to mapreduce in the scenario
Yes you can do it.  In pig you would write something like

A = load ‘a.txt’ as (id, name, age, ...)
B = load ‘b.txt’ as (id, address, ...)
C = JOIN A BY id, B BY id;
STORE C into ‘c.txt’

Hive can do it similarly too.  Or you could write your own directly in map/redcue or using the data_join jar.

--Bobby Evans

On 5/29/12 4:08 AM, "lzg" <[EMAIL PROTECTED]> wrote:

Hi,

I wonder that if Hadoop can solve effectively the question as following:

=========================================input file: a.txt, b.txt
result: c.txt

a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...

b.txt:
id1,address1,...
id2,address2,...
id3,address3,...

c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
=======================================
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?

Any suggestion can help me. Thank you very much!

Best Regards,

Gump
+
Michel Segel 2012-05-29, 10:32
+
Nitin Pawar 2012-05-29, 10:36
+
liuzhg 2012-05-29, 10:15
+
samir das mohapatra 2012-05-29, 11:33
+
Devaraj k 2012-05-29, 10:40
+
Soumya Banerjee 2012-05-29, 10:53
+
liuzhg 2012-05-30, 01:23
+
Nitin Pawar 2012-05-30, 03:49
+
samir das mohapatra 2012-05-30, 13:32
+
Wilson Wayne - wwilso 2012-05-30, 13:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB