Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> streaming secondary sort not working?


Copy link to this message
-
streaming secondary sort not working?
hadoop jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.*.jar \

-input /export/home/phubenig/fileDataInput \

-output /export/home/phubenig/fileDataOutput \

-mapper /export/home/phubenig/fileDataJob/non_map.py \

-reducer org.apache.hadoop.mapred.lib.IdentityReducer \

-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \

mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
\

num.key.fields.for.partition=1 \

stream.num.map.output.key.fields=7 \

mapred.text.key.comparator.options="-k1,1 -k2,7" \

mapred.text.key.partitioner.options="-k1,1" \
 -file /export/home/phubenig/fileDataInput/fileData.txt

~~~~~~~~~~~~

Input file (tab separated):

C k d m n h b

A w g i w t l

A w f y m y h

C u r d h c b

A y q w m g k

B w b s d q g

C q j j d f b

C l n x a g f

C o r m a g p

C v w l a t f

B c l f n t u

B x t o e x p

A q m r d q i

C e i o u g l

A x m w u o i

A j p m d k r

C s t m r m t

B s w l f k y

B a f r v f x

A s z d v s h

C o x j c w r

Sorts on first key (the capital letters) but does not perform the secondary
sort on the other fields.  Does anyone see the problem?  What am I
missing?  Seems like it should work.

Thanks for your time.

Paul

non_map.py:

#!/usr/bin/env python

import sys

for line in sys.stdin:

stripped = line.rstrip()

print(stripped)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB