-Re: MR missing lines
Jean-Marc Spaggiari 2012-12-20, 00:39
Thanks for the hint! Even if it's not fixing my issue, at least my
tests are going to be faster.
I will take a look at the documentation to understand what
deleteColumn was doing.
2012/12/19, Anoop Sam John <[EMAIL PROTECTED]>:
> Jean: just one thought after seeing the description and the code.. Not
> related to the missing as such
> You want to delete the row fully right?
>>My table is only one CF with one C with one version
> And your code is like
>> Delete delete_entry_proposed = new Delete(key);
> deleteColumn() is useful when you want to delete specific column's specific
> version in a row. In your case this may be really not needed. Just Delete
> delete_entry_proposed = new Delete(key); may be enough so that the delete
> type is ROW delete.
> You can see the javadoc of the deleteColumn() API in which it clearly says
> it is an expensive op. At the server side there will be a need to do a Get
> In your case these are really unwanted over head .. I think...
> From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
> Sent: Tuesday, December 18, 2012 7:07 PM
> To: [EMAIL PROTECTED]
> Subject: Re: MR missing lines
> I faced the issue again today...
> RowCounter gave me 104313 lines
> Here is the output of the job counters:
> 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_ADDED=81594
> 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_SIMILAR=434
> 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_NO_CHANGES=14250
> 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_DUPLICATE=428
> 12/12/17 22:32:52 INFO mapred.JobClient: NON_DELETED_ROWS=0
> 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_EXISTING=7605
> 12/12/17 22:32:52 INFO mapred.JobClient: ROWS_PARSED=104311
> There is a 2 lines difference between ROWS_PARSED and he counter.
> ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
> ENTRY_EXISTING are the 5 states an entry can have. Total of all those
> counters is equal to the ROWS_PARSED value, so it's alligned. Code is
> handling all the possibilities.
> The ROWS_PARSED counter is incremented right at the beginning like
> that (I removed the comments and javadoc for lisibility):
> * The comments ...
> public void map(ImmutableBytesWritable row__, Result values,
> context) throws IOException
> List<KeyValue> KVs = values.list();
> // Get the current row.
> byte key = values.getRow();
> // First thing we do, we mark this line to
> be deleted.
> Delete delete_entry_proposed = new
> The deletes_entry_proposed is a list of rows to delete. After each
> call to the delete method, the number of remaining lines into this
> list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
> should be deleted correctly.
> I re-ran the rowcounter after the job, and I still have ROWS=5971
> lines into the table. I check all my "feeding process" and they are
> all closed.
> My table is only one CF with one C with one version.
> I can guess that the remaining 5971 lines into the table is an error
> on my side, but I'm not able to find where since all the counters are
> matching. I will add one counter which will add all the entries in the
> delete list before calling the delete method. This should match the
> number of rows.
> Again, I will re-feed the table today with fresh data and re-run the job...