Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to mapreduce in the scenario


Copy link to this message
-
Re: How to mapreduce in the scenario
Yes you can do it.  In pig you would write something like

A = load ‘a.txt’ as (id, name, age, ...)
B = load ‘b.txt’ as (id, address, ...)
C = JOIN A BY id, B BY id;
STORE C into ‘c.txt’

Hive can do it similarly too.  Or you could write your own directly in map/redcue or using the data_join jar.

--Bobby Evans

On 5/29/12 4:08 AM, "lzg" <[EMAIL PROTECTED]> wrote:

Hi,

I wonder that if Hadoop can solve effectively the question as following:

=========================================input file: a.txt, b.txt
result: c.txt

a.txt:
id1,name1,age1,...
id2,name2,age2,...
id3,name3,age3,...
id4,name4,age4,...

b.txt:
id1,address1,...
id2,address2,...
id3,address3,...

c.txt
id1,name1,age1,address1,...
id2,name2,age2,address2,...
=======================================
I know that it can be done well by database.
But I want to handle it with hadoop if possible.
Can hadoop meet the requirement?

Any suggestion can help me. Thank you very much!

Best Regards,

Gump
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB