Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Hadoop's datajoin


Copy link to this message
-
Re: Hadoop's datajoin
Ted Yu 2010-07-14, 16:37
Please read the source code of DataJoinJob.java
Then you would know that the last parameter should be the number of
reducers.

On Wed, Jul 14, 2010 at 2:33 AM, Denim Live <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Thanks. I have located the datajoin jar. Now I execute the progam the same
> way
> as specified in the readme file of the datajoin. I have two text files A
> and B
> with the same content as mentioned in the
>
> $Hadoop_Home/src/contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/readme.txt
>  file. The command line  i use is:
>
> bin/hadoop jar hadoop-0.19.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoinIn datajoinOut
> org.apache.hadoop.io.Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput
> org.apache.hadoop.io.Text
>
> But I get the following error:
>
> Using SequenceFileInputFormat: datajoinOut
> java.lang.NumberFormatException: For input string:
> "org.apache.hadoop.io.Text"
>        at
> java.lang.NumberFormatException.forInputString(NumberFormatException.
> java:48)
>        at java.lang.Integer.parseInt(Integer.java:449)
>        at java.lang.Integer.parseInt(Integer.java:499)
>        at
> org.apache.hadoop.contrib.utils.join.DataJoinJob.createDataJoinJob(Da
> taJoinJob.java:70)
>        at
> org.apache.hadoop.contrib.utils.join.DataJoinJob.main(DataJoinJob.jav
> a:165)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> I don't understand why is it giving the number format exception? and also
> that
> in datajoinOut?? My input files contain the records with tab-separated
> fields as
> described in the readme file. Should I use sequence files for input? I have
> tried that as well but I get the same error.
>
> Any help in this regard is highly appreciated. I have tried this for so
> long in
> vain.
>
> Thanks in advance
>
>
>
> ________________________________
> From: Hemanth Yamijala <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Mon, July 12, 2010 9:21:31 AM
> Subject: Re: Hadoop's datajoin
>
> Hi,
>
> > I am trying to use the hadoop's datajoin for joining two relation.
> According
> to
> > the Readme file of datajoin, it gives the following syntax:
> >
> > $HADOOP_HOME/bin/hadoop jar hadoop-datajoin-examples.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> > datajoin/output
> > Text 1  org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> >
> >
> > But I do not find hadoop-datajoin-examples.jar anywhere in my
> Hadoop_home. Can
> > anyone tell me how to produce it or where to find it?
>
> Datajoin is a contrib module. So, you will typically find it under
> contrib/datajoin/. The name could something slightly different - it
> could have a version number and other things.
>
> Thanks
> Hemanth
> >
> > Thanks in advance.
> >
> >
> >
> >
>
>
>
>
>