Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop's datajoin


Copy link to this message
-
Re: Hadoop's datajoin
Please read the source code of DataJoinJob.java
Then you would know that the last parameter should be the number of
reducers.

On Wed, Jul 14, 2010 at 2:33 AM, Denim Live <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Thanks. I have located the datajoin jar. Now I execute the progam the same
> way
> as specified in the readme file of the datajoin. I have two text files A
> and B
> with the same content as mentioned in the
>
> $Hadoop_Home/src/contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/readme.txt
>  file. The command line  i use is:
>
> bin/hadoop jar hadoop-0.19.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoinIn datajoinOut
> org.apache.hadoop.io.Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput
> org.apache.hadoop.io.Text
>
> But I get the following error:
>
> Using SequenceFileInputFormat: datajoinOut
> java.lang.NumberFormatException: For input string:
> "org.apache.hadoop.io.Text"
>        at
> java.lang.NumberFormatException.forInputString(NumberFormatException.
> java:48)
>        at java.lang.Integer.parseInt(Integer.java:449)
>        at java.lang.Integer.parseInt(Integer.java:499)
>        at
> org.apache.hadoop.contrib.utils.join.DataJoinJob.createDataJoinJob(Da
> taJoinJob.java:70)
>        at
> org.apache.hadoop.contrib.utils.join.DataJoinJob.main(DataJoinJob.jav
> a:165)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> I don't understand why is it giving the number format exception? and also
> that
> in datajoinOut?? My input files contain the records with tab-separated
> fields as
> described in the readme file. Should I use sequence files for input? I have
> tried that as well but I get the same error.
>
> Any help in this regard is highly appreciated. I have tried this for so
> long in
> vain.
>
> Thanks in advance
>
>
>
> ________________________________
> From: Hemanth Yamijala <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Mon, July 12, 2010 9:21:31 AM
> Subject: Re: Hadoop's datajoin
>
> Hi,
>
> > I am trying to use the hadoop's datajoin for joining two relation.
> According
> to
> > the Readme file of datajoin, it gives the following syntax:
> >
> > $HADOOP_HOME/bin/hadoop jar hadoop-datajoin-examples.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> > datajoin/output
> > Text 1  org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> >
> >
> > But I do not find hadoop-datajoin-examples.jar anywhere in my
> Hadoop_home. Can
> > anyone tell me how to produce it or where to find it?
>
> Datajoin is a contrib module. So, you will typically find it under
> contrib/datajoin/. The name could something slightly different - it
> could have a version number and other things.
>
> Thanks
> Hemanth
> >
> > Thanks in advance.
> >
> >
> >
> >
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB