Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> MR missing lines


+
Jean-Marc Spaggiari 2012-12-16, 12:52
+
Kevin Odell 2012-12-16, 14:05
+
Asaf Mesika 2012-12-16, 16:28
+
Jean-Marc Spaggiari 2012-12-17, 00:20
+
Jean-Marc Spaggiari 2012-12-17, 12:15
+
Jean-Marc Spaggiari 2012-12-18, 13:37
+
Anoop Sam John 2012-12-19, 05:11
Copy link to this message
-
Re: MR missing lines
Hi Anoop,

Thanks for the hint! Even if it's not fixing my issue, at least my
tests are going to be faster.

I will take a look at the documentation to understand what
deleteColumn was doing.

JM

2012/12/19, Anoop Sam John <[EMAIL PROTECTED]>:
> Jean:  just one thought after seeing the description and the code.. Not
> related to the missing as such
>
> You want to delete the row fully right?
>>My table is only one CF with one C with one version
> And your code is like
>>  Delete delete_entry_proposed = new Delete(key);
>>  delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
>> KVs.get(0).getQualifier());
>
> deleteColumn() is useful when you want to delete specific column's specific
> version in a row.  In your case this may be really not needed. Just Delete
> delete_entry_proposed = new Delete(key);  may be enough so that the delete
> type is ROW delete.
>
> You can see the javadoc of the deleteColumn() API in which it clearly says
> it is an expensive op. At the server side there will be a need to do a Get
> call..
> In your case these are really unwanted over head .. I think...
>
> -Anoop-
> ________________________________________
> From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
> Sent: Tuesday, December 18, 2012 7:07 PM
> To: [EMAIL PROTECTED]
> Subject: Re: MR missing lines
>
> I faced the issue again today...
>
> RowCounter gave me 104313 lines
> Here is the output of the job counters:
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
> 12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
> 12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311
>
> There is a 2 lines difference between ROWS_PARSED and he counter.
> ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
> ENTRY_EXISTING are the 5 states an entry can have. Total of all those
> counters is equal to the ROWS_PARSED value, so it's alligned. Code is
> handling all the possibilities.
>
> The ROWS_PARSED counter is incremented right at the beginning like
> that (I removed the comments and javadoc for lisibility):
>                 /**
>                  * The comments ...
>                  */
>                 @Override
>                 public void map(ImmutableBytesWritable row__, Result values,
> Context
> context) throws IOException
>                 {
>
>
> context.getCounter(Counters.ROWS_PARSED).increment(1);
>                         List<KeyValue> KVs = values.list();
>                         try
>                         {
>
>                                 // Get the current row.
>                                 byte[] key = values.getRow();
>
>                                 // First thing we do, we mark this line to
> be deleted.
>                                 Delete delete_entry_proposed = new
> Delete(key);
>
> delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
> KVs.get(0).getQualifier());
>
> deletes_entry_proposed.add(delete_entry_proposed);
>
>
> The deletes_entry_proposed is a list of rows to delete. After each
> call to the delete method, the number of remaining lines into this
> list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
> should be deleted correctly.
>
> I re-ran the rowcounter after the job, and I still have ROWS=5971
> lines into the table. I check all my "feeding process" and they are
> all closed.
>
> My table is only one CF with one C with one version.
>
> I can guess that the remaining 5971 lines into the table is an error
> on my side, but I'm not able to find where since all the counters are
> matching. I will add one counter which will add all the entries in the
> delete list before calling the delete method. This should match the
> number of rows.
>
> Again, I will re-feed the table today with fresh data and re-run the job...
+
Anoop Sam John 2012-12-20, 04:24
+
Harsh J 2012-12-16, 14:41