Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Missing records from HDFS


Copy link to this message
-
Re: Missing records from HDFS
Azuryy Yu 2013-11-22, 11:19
I do think this is because of your RecorderReader, can you paste your code
here? and give a piece of data example.

please use pastebin if you want.
On Fri, Nov 22, 2013 at 7:16 PM, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]>wrote:

>  One more thing,
>
>  if we split the files then all the records are processed. Files are
> of 70,5MB.
>
>  Thanks,
>
>  Zoraida.-
>
>   De: zoraida <[EMAIL PROTECTED]>
> Fecha: viernes, 22 de noviembre de 2013 08:59
>
> Para: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Asunto: Re: Missing records from HDFS
>
>   Thanks for your response Azuryy.
>
>  My hadoop version: 2.0.0-cdh4.3.0
> InputFormat: a custom class that extends from FileInputFormat(csv input
> format)
> These fiels are under the same directory, different files.
> My input path is configured using oozie throughout the propertie
> mapred.input.dir.
>
>
>  Same code and input running on Hadoop 2.0.0-cdh4.2.1 works fine. Does
> not discard any record.
>
>  Thanks.
>
>   De: Azuryy Yu <[EMAIL PROTECTED]>
> Responder a: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Fecha: jueves, 21 de noviembre de 2013 07:31
> Para: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Asunto: Re: Missing records from HDFS
>
>   what's your hadoop version? and which InputFormat are you used?
>
>  these files under one directory or there are lots of subdirectory? how
> ddi you configure input path in your main?
>
>
>
> On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]>wrote:
>
>>  Hi all,
>>
>>  my job is not reading all the input records. In the input directory I
>> have a set of files containing a total of 6000000 records but only 5997000
>> are processed. The Map Input Records counter says 5997000.
>> I have tried downloading the files with a getmerge to check how many
>> records would return but the correct number is returned(6000000).
>>
>>  Do you have any suggestion?
>>
>>  Thanks.
>>
>> ------------------------------
>>
>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
>> nuestra política de envío y recepción de correo electrónico en el enlace
>> situado más abajo.
>> This message is intended exclusively for its addressee. We only send and
>> receive email on the basis of the terms set out at:
>> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>>
>
>
> ------------------------------
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>