Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> MR missing lines


Hi,

I have a table where I'm running MR each time is exceding 100 000 rows.

When the target is reached, all the feeding process are stopped.

Yesterday it reached 123608 rows. So I stopped the feeding process,
and ran the MR.

For each line, the MR is creating a delete. The delete is placed on a
list, and when the list reached 10 elements, it's sent to the table.
In the clean method, the list is sent to the table if there is any
element in it.

So at the en of the MR, I should have an empty table.

The table is splitted over 128 regions. And I have 8 region servers.

What is disturbing me is that after the MR, I had 38 lines remaining
on the table. the MR took 348 minutes to run. So I ran the MR again,
which this time took 2 minutes, and now I have 1 row remaining in the
table.

I looked at the logs (for the 38 lines run) and there is nothing in
it. There is some scanner timeout exception for the run of the 100K
rows.

I'm running HBase 0.94.3.

I will hava another 100K rows today, so I will re-run the job. I will
increase the timeout to make sure I got no exception, but even when I
ran the 38 lines with no exception one was remaining...

Any idea why and where I can seach? It's not really an issue for me
since I can just re-run the job, but this might be an issue for some
others.

JM