Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Custom InputFormat errer


Copy link to this message
-
Custom InputFormat errer
Hi guys

I met a interesting problem when I implement my own custom InputFormat
which extends the FileInputFormat.(I rewrite the RecordReader class but not
the InputSplit class)

My recordreader will take following format as a basic record: (my
recordreader extends the LineRecordReader. It returns a record if it meets
#Trailer# and contains #Header#. I only have one input file that is
composed of many of following basic record)

#Header#
.....(many lines, may be 0 lines or 1000 lines, it varies)
#Trailer#

Everything works fine if above basic input unit in a file is integer times
of mapper. For example, I use 2 mappers and there are two basic records in
my input file. Or I use 3 mappers and there are 6 basic units in the input
file.

However, if I use 4 mappers but there are 3 basic units in the input
file(not integer times). The final output is incorrect. The "Map Input
Bytes" in the job counter is also less than the input file size. How can I
fix it? Do I need to rewrite the inputSplit?

Any reply will be appreciated!

Regards!

Chen
+
Harsh J 2012-08-29, 07:46
+
Harsh J 2012-08-30, 02:49
+
Chen He 2012-08-30, 02:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB