Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Non utf-8 chars in input


Copy link to this message
-
Non utf-8 chars in input
Hi,

I am using default inputFormat class for reading input from text files but the input file has some non utf-8 characters.
I guess that TextInputFormat class is default inputFormat class and it replaces these non utf-8 chars by "\uFFFD". If I do not want this behavior and need actual char in my mapper what should be the correct inputFormat class ?

Regards,
Ajay Srivastava
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB