Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> MR missing lines

Jean-Marc Spaggiari 2012-12-16, 12:52
Kevin Odell 2012-12-16, 14:05
Asaf Mesika 2012-12-16, 16:28
Jean-Marc Spaggiari 2012-12-17, 00:20
Jean-Marc Spaggiari 2012-12-17, 12:15
Copy link to this message
Re: MR missing lines
I faced the issue again today...

RowCounter gave me 104313 lines
Here is the output of the job counters:
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311

There is a 2 lines difference between ROWS_PARSED and he counter.
ENTRY_EXISTING are the 5 states an entry can have. Total of all those
counters is equal to the ROWS_PARSED value, so it's alligned. Code is
handling all the possibilities.

The ROWS_PARSED counter is incremented right at the beginning like
that (I removed the comments and javadoc for lisibility):
* The comments ...
public void map(ImmutableBytesWritable row__, Result values, Context
context) throws IOException

List<KeyValue> KVs = values.list();

// Get the current row.
byte[] key = values.getRow();

// First thing we do, we mark this line to be deleted.
Delete delete_entry_proposed = new Delete(key);
The deletes_entry_proposed is a list of rows to delete. After each
call to the delete method, the number of remaining lines into this
list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
should be deleted correctly.

I re-ran the rowcounter after the job, and I still have ROWS=5971
lines into the table. I check all my "feeding process" and they are
all closed.

My table is only one CF with one C with one version.

I can guess that the remaining 5971 lines into the table is an error
on my side, but I'm not able to find where since all the counters are
matching. I will add one counter which will add all the entries in the
delete list before calling the delete method. This should match the
number of rows.

Again, I will re-feed the table today with fresh data and re-run the job...


2012/12/17, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
> The job run the morning, and of course, this time, all the rows got
> processed ;)
> So I will give it few other tries and will keep you posted if I'm able
> to reproduce that again.
> Thanks,
> JM
> 2012/12/16, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
>> Thanks for the suggestions.
>> I already have logs to display all the exepctions and there is
>> nothing. I can't display the work done, there is to much :(
>> I have counters "counting" the rows processed and they match what is
>> done, minus what is not processed. I have just added few other
>> counters. One right at the beginning, and one to count what are the
>> records remaining on the delete list, as suggested.
>> I will run the job again tomorrow, see the result and keep you posted.
>> JM
>> 2012/12/16, Asaf Mesika <[EMAIL PROTECTED]>:
>>> Did you check the returned array of the delete method to make sure all
>>> records sent for delete have been deleted?
>>> Sent from my iPhone
>>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
>>> wrote:
>>>> Hi,
>>>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>>> When the target is reached, all the feeding process are stopped.
>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>> and ran the MR.
>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>> In the clean method, the list is sent to the table if there is any
Anoop Sam John 2012-12-19, 05:11
Jean-Marc Spaggiari 2012-12-20, 00:39
Anoop Sam John 2012-12-20, 04:24
Harsh J 2012-12-16, 14:41