Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> MR missing lines


+
Jean-Marc Spaggiari 2012-12-16, 12:52
+
Kevin Odell 2012-12-16, 14:05
+
Asaf Mesika 2012-12-16, 16:28
Copy link to this message
-
Re: MR missing lines
Thanks for the suggestions.

I already have logs to display all the exepctions and there is
nothing. I can't display the work done, there is to much :(

I have counters "counting" the rows processed and they match what is
done, minus what is not processed. I have just added few other
counters. One right at the beginning, and one to count what are the
records remaining on the delete list, as suggested.

I will run the job again tomorrow, see the result and keep you posted.

JM
2012/12/16, Asaf Mesika <[EMAIL PROTECTED]>:
> Did you check the returned array of the delete method to make sure all
> records sent for delete have been deleted?
>
> Sent from my iPhone
>
> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
> wrote:
>
>> Hi,
>>
>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>
>> When the target is reached, all the feeding process are stopped.
>>
>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>> and ran the MR.
>>
>> For each line, the MR is creating a delete. The delete is placed on a
>> list, and when the list reached 10 elements, it's sent to the table.
>> In the clean method, the list is sent to the table if there is any
>> element in it.
>>
>> So at the en of the MR, I should have an empty table.
>>
>> The table is splitted over 128 regions. And I have 8 region servers.
>>
>> What is disturbing me is that after the MR, I had 38 lines remaining
>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>> which this time took 2 minutes, and now I have 1 row remaining in the
>> table.
>>
>> I looked at the logs (for the 38 lines run) and there is nothing in
>> it. There is some scanner timeout exception for the run of the 100K
>> rows.
>>
>> I'm running HBase 0.94.3.
>>
>> I will hava another 100K rows today, so I will re-run the job. I will
>> increase the timeout to make sure I got no exception, but even when I
>> ran the 38 lines with no exception one was remaining...
>>
>> Any idea why and where I can seach? It's not really an issue for me
>> since I can just re-run the job, but this might be an issue for some
>> others.
>>
>> JM
>
+
Jean-Marc Spaggiari 2012-12-17, 12:15
+
Jean-Marc Spaggiari 2012-12-18, 13:37
+
Anoop Sam John 2012-12-19, 05:11
+
Jean-Marc Spaggiari 2012-12-20, 00:39
+
Anoop Sam John 2012-12-20, 04:24
+
Harsh J 2012-12-16, 14:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB