Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Corrupted input data to map


Copy link to this message
-
Re: Corrupted input data to map
Jeff Zhang 2010-10-16, 01:21
You can read the input as plain text then do type conversion in
mapper, if there's NumberFormatException happens, you can decide how
to do with it , like add a customized Counter to record it. or set a
default value

On Sat, Oct 16, 2010 at 5:02 AM, Boyu Zhang <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I am running a program with input 1 million lines of data, among the 1
> million, 5 or 6 lines data are corrupted. The way the are corrupted is: in
> the position which a float number is expected, like 3.4 , instead of a float
> number, something like this is there: 3.4.5.6 . So when the map runs, it
> throws a multiple point in num exception.
>
> My question is: the map tasks that have the exception are marked failure,
> how about the data processed by the same map before the exception, do they
> reach the reduce task? or they are treated like garbage? Thank you very much
> any help is appreciated.
>
> Boyu
>

--
Best Regards

Jeff Zhang