Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> DataBlockScanner scan period


Copy link to this message
-
Re: DataBlockScanner scan period
Hi Thanh,

That is correct.  Last time I read the code, Hadoop scheduled the block verifications randomly throughout the period in order to avoid periodic effects (i.e., high load every N minutes).

Brian

On Oct 13, 2010, at 7:14 PM, Thanh Do wrote:

> Brian,
>
> When you say *attempt* to complete and *entire* node scan,
> you mean for example, if a node has 100 block files, it will
> try to verify all 100 block every 3 weeks?
> That is in average, a block is scanned every (3 weeks / 100 time interval)?
>
> Thanks
> Thanh
>
>
> On Wed, Oct 13, 2010 at 7:07 PM, Brian Bockelman <[EMAIL PROTECTED]>wrote:
>
>> Hi Thanh,
>>
>> The scan period is the period that hadoop *attempts* to complete an entire
>> node scan.  That is, if it's set to 3 weeks, HDFS will try to scan each
>> block once every 3 weeks.
>>
>> Obviously, depending on the bandwidth you have made available to the
>> scanning thread, you can specify impossibly small periods.
>>
>> Brian
>>
>> On Oct 13, 2010, at 7:01 PM, Thanh Do wrote:
>>
>>> Hi again,
>>>
>>> Could any body explain to me about the scanning period
>>> policy of DataBlockScanner? That is who often it wake up
>>> and scan a block file.
>>> When looking at the code, I found
>>>
>>> static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks
>>>
>>>
>>> but definitely it does not wake up and pick a random block
>>> to verify every three weeks, right?
>>>
>>> Thanks a lot,
>>> Thanh
>>
>>