Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # dev >> problem about use mapreduce get big file


Copy link to this message
-
problem about use mapreduce get big file
hi,maillist:
     i want to produce a big input file( or a collection of big files) use
MR job,my output file is very sample ,like following info

1,text
2,text
3,text
......

the question is the whole output ,it's first column is a count, but if i
has 16 map ,it output will be 16 file like
xxx_m1_xxx
......
xxx_m15_xxx
i do know  how to guarantee the first file output is (if each file has 2
record)

1,text
2,text

and second is
3,text
4,text

so i can combine them into
1.text
2,text
3,text
4,text
  what i think is if i can know current map position in map construct like
(i have a map array ,and i can get the index which current map task on
though map task context)

anyone can help?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB