Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> What is the output format of org.apache.hadoop.examples.Join?


Copy link to this message
-
What is the output format of org.apache.hadoop.examples.Join?
I am reading the following mail:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg04066.html

After running the following command (I am using Hadoop 1.0.4):

bin/hadoop jar hadoop-examples-1.0.4.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout
Then I run "bin/hadoop fs -text joinout/part-00000". I see the following
result:

AAAAAAAA        a0      [,,]
AAAAAAAA        b0      [,,]
AAAAAAAA        c0      [,,]
BBBBBBBB        a1      [,,]
BBBBBBBB        b1      [,,]
BBBBBBBB        b2      [,,]
BBBBBBBB        b3      [,,]
BBBBBBBB        c1      [,,]
CCCCCCCC        a2      [,,]
CCCCCCCC        a3      [,,]
DDDDDDDD        c2      [,,]
DDDDDDDD        c3      [,,]

But Chris said that the result should be:

AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]

Is Join's output format changed for Hadoop 1.0.4?

--
Jingguo
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB