Jean-Marc Spaggiari 2012-12-16, 12:52
Kevin Odell 2012-12-16, 14:05
Asaf Mesika 2012-12-16, 16:28
Jean-Marc Spaggiari 2012-12-17, 00:20
Jean-Marc Spaggiari 2012-12-17, 12:15
-Re: MR missing lines
Jean-Marc Spaggiari 2012-12-18, 13:37
I faced the issue again today...
RowCounter gave me 104313 lines
Here is the output of the job counters:
12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_ADDED=81594
12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_SIMILAR=434
12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_NO_CHANGES=14250
12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_DUPLICATE=428
12/12/17 22:32:52 INFO mapred.JobClient: NON_DELETED_ROWS=0
12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_EXISTING=7605
12/12/17 22:32:52 INFO mapred.JobClient: ROWS_PARSED=104311
There is a 2 lines difference between ROWS_PARSED and he counter.
ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
ENTRY_EXISTING are the 5 states an entry can have. Total of all those
counters is equal to the ROWS_PARSED value, so it's alligned. Code is
handling all the possibilities.
The ROWS_PARSED counter is incremented right at the beginning like
that (I removed the comments and javadoc for lisibility):
* The comments ...
public void map(ImmutableBytesWritable row__, Result values, Context
context) throws IOException
List<KeyValue> KVs = values.list();
// Get the current row.
byte key = values.getRow();
// First thing we do, we mark this line to be deleted.
Delete delete_entry_proposed = new Delete(key);
The deletes_entry_proposed is a list of rows to delete. After each
call to the delete method, the number of remaining lines into this
list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
should be deleted correctly.
I re-ran the rowcounter after the job, and I still have ROWS=5971
lines into the table. I check all my "feeding process" and they are
My table is only one CF with one C with one version.
I can guess that the remaining 5971 lines into the table is an error
on my side, but I'm not able to find where since all the counters are
matching. I will add one counter which will add all the entries in the
delete list before calling the delete method. This should match the
number of rows.
Again, I will re-feed the table today with fresh data and re-run the job...
2012/12/17, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
> The job run the morning, and of course, this time, all the rows got
> processed ;)
> So I will give it few other tries and will keep you posted if I'm able
> to reproduce that again.
> 2012/12/16, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
>> Thanks for the suggestions.
>> I already have logs to display all the exepctions and there is
>> nothing. I can't display the work done, there is to much :(
>> I have counters "counting" the rows processed and they match what is
>> done, minus what is not processed. I have just added few other
>> counters. One right at the beginning, and one to count what are the
>> records remaining on the delete list, as suggested.
>> I will run the job again tomorrow, see the result and keep you posted.
>> 2012/12/16, Asaf Mesika <[EMAIL PROTECTED]>:
>>> Did you check the returned array of the delete method to make sure all
>>> records sent for delete have been deleted?
>>> Sent from my iPhone
>>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
>>>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>>> When the target is reached, all the feeding process are stopped.
>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>> and ran the MR.
>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>> In the clean method, the list is sent to the table if there is any
Anoop Sam John 2012-12-19, 05:11
Jean-Marc Spaggiari 2012-12-20, 00:39
Anoop Sam John 2012-12-20, 04:24
Harsh J 2012-12-16, 14:41