Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> What is the output format of org.apache.hadoop.examples.Join?


Copy link to this message
-
What is the output format of org.apache.hadoop.examples.Join?
I am reading the following mail:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg04066.html

After running the following command (I am using Hadoop 1.0.4):

bin/hadoop jar hadoop-examples-1.0.4.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout
Then I run "bin/hadoop fs -text joinout/part-00000". I see the following
result:

AAAAAAAA        a0      [,,]
AAAAAAAA        b0      [,,]
AAAAAAAA        c0      [,,]
BBBBBBBB        a1      [,,]
BBBBBBBB        b1      [,,]
BBBBBBBB        b2      [,,]
BBBBBBBB        b3      [,,]
BBBBBBBB        c1      [,,]
CCCCCCCC        a2      [,,]
CCCCCCCC        a3      [,,]
DDDDDDDD        c2      [,,]
DDDDDDDD        c3      [,,]

But Chris said that the result should be:

AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]

Is Join's output format changed for Hadoop 1.0.4?

--
Jingguo