Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Missing records from HDFS


Copy link to this message
-
Re: Missing records from HDFS
I do think this is because of your RecorderReader, can you paste your code
here? and give a piece of data example.

please use pastebin if you want.
On Fri, Nov 22, 2013 at 7:16 PM, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]>wrote:

>  One more thing,
>
>  if we split the files then all the records are processed. Files are
> of 70,5MB.
>
>  Thanks,
>
>  Zoraida.-
>
>   De: zoraida <[EMAIL PROTECTED]>
> Fecha: viernes, 22 de noviembre de 2013 08:59
>
> Para: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Asunto: Re: Missing records from HDFS
>
>   Thanks for your response Azuryy.
>
>  My hadoop version: 2.0.0-cdh4.3.0
> InputFormat: a custom class that extends from FileInputFormat(csv input
> format)
> These fiels are under the same directory, different files.
> My input path is configured using oozie throughout the propertie
> mapred.input.dir.
>
>
>  Same code and input running on Hadoop 2.0.0-cdh4.2.1 works fine. Does
> not discard any record.
>
>  Thanks.
>
>   De: Azuryy Yu <[EMAIL PROTECTED]>
> Responder a: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Fecha: jueves, 21 de noviembre de 2013 07:31
> Para: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Asunto: Re: Missing records from HDFS
>
>   what's your hadoop version? and which InputFormat are you used?
>
>  these files under one directory or there are lots of subdirectory? how
> ddi you configure input path in your main?
>
>
>
> On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALGO SANCHEZ <[EMAIL PROTECTED]>wrote:
>
>>  Hi all,
>>
>>  my job is not reading all the input records. In the input directory I
>> have a set of files containing a total of 6000000 records but only 5997000
>> are processed. The Map Input Records counter says 5997000.
>> I have tried downloading the files with a getmerge to check how many
>> records would return but the correct number is returned(6000000).
>>
>>  Do you have any suggestion?
>>
>>  Thanks.
>>
>> ------------------------------
>>
>> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
>> nuestra política de envío y recepción de correo electrónico en el enlace
>> situado más abajo.
>> This message is intended exclusively for its addressee. We only send and
>> receive email on the basis of the terms set out at:
>> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>>
>
>
> ------------------------------
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB