Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Streaming question.


Copy link to this message
-
Re: Streaming question.
Dan,

It is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-1888)
which has been identified in 0.21.0 release. Which Hadoop release are you
using?

Thanks,
Praveen

On Thu, Nov 3, 2011 at 10:22 AM, Dan Young <[EMAIL PROTECTED]> wrote:

> I'm a total newbie @ Hadoop and and trying to follow an example (a Useful
> Partitioner Class) on the Hadoop Streaming Wiki, but with my data. So I
> have data like this:
>
> 520460379 1 14067 759015 1142 3 1 8.8
> 520460380 1 120543 2759354 1142 0 0 0
> 520460381 3 120543 2759352 1142 0 0 0
> 520460382 3 12660 679569 1142 0 0 0
> 520460383 1 120543 2759355 1142 0 0 0
> 520460384 3 120543 2759353 1142 0 0 0
> 520460385 1 120575 2759568 1142 0 0 0
> 520460386 3 120575 2759570 1142 0 0 0
> 520460387 1 120575 2759569 1142 0 0 0
>
> and I'm trying to run a streaming job that partitions all the keys
> together based on field 2 and field 3.  So for example 1 120543
> 2759354 and 1 120543 2759355 would go to the same partitioner, and the
> output key(s) would be something like 1.120543 .  I'm trying the following
> command but get an error:
>
> $HADOOP_HOME/bin/hadoop  jar
> $HADOOP_HOME/contrib/streaming/hadoop-0.20.2-streaming.jar \
> -D stream.map.output.field.separator=. \
> -D stream.num.map.output.key.fields=2 \
> -D mapreduce.map.output.key.field.separator=. \
> -D mapreduce.partition.keypartitioner.options=-k1,2 \
> -D mapreduce.job.reduces=1 \
> -input $HOME/temp/foo \
> -output dank_phase0 \
> -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
> -reducer org.apache.hadoop.mapred.lib.IdentityReducer \
> -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
>
>
> 11/11/02 22:45:05 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId> 11/11/02 22:45:05 WARN mapred.JobClient: No job jar file set.  User
> classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> 11/11/02 22:45:05 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 11/11/02 22:45:06 INFO streaming.StreamJob: getLocalDirs():
> [/tmp/hadoop-dyoung/mapred/local]
> 11/11/02 22:45:06 INFO streaming.StreamJob: Running job: job_local_0001
> 11/11/02 22:45:06 INFO streaming.StreamJob: Job running in-process (local
> Hadoop)
> 11/11/02 22:45:06 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 11/11/02 22:45:07 INFO mapred.MapTask: numReduceTasks: 1
> 11/11/02 22:45:07 INFO mapred.MapTask: io.sort.mb = 200
> 11/11/02 22:45:07 INFO mapred.MapTask: data buffer = 159383552/199229440
> 11/11/02 22:45:07 INFO mapred.MapTask: record buffer = 524288/655360
> 11/11/02 22:45:07 WARN mapred.LocalJobRunner: job_local_0001
> java.io.IOException: Type mismatch in key from map: expected
> org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845)
>  at
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 11/11/02 22:45:07 INFO streaming.StreamJob:  map 0%  reduce 0%
> 11/11/02 22:45:07 INFO streaming.StreamJob: Job running in-process (local
> Hadoop)
> 11/11/02 22:45:07 ERROR streaming.StreamJob: Job not Successful!
> 11/11/02 22:45:07 INFO streaming.StreamJob: killJob...
> Streaming Job Failed!
>
> I've tried a number of permutations of what's on the Hadoop Wiki, but I'm
> still having the error. Does anyone have any insight into what I'm doing
> wrong?
>
> Regards,
>
> Dan
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB