Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> DataBlockScanner scan period


Copy link to this message
-
Re: DataBlockScanner scan period

On Oct 13, 2010, at 7:29 PM, Thanh Do wrote:

> Hi Brian,
>
> If this is the case, then is there any chance that,
> some how the DataBlockScanner cannot finishes
> the verification for all the block in three weeks
> (e.g, a node has a very large number of blocks)?
>

Yes.  At some point, I'd really like to figure out what percentage of our blocks actually get scanned at our site, I suspect some go very long without a scan.

Brian

> Thanh
>
> On Wed, Oct 13, 2010 at 7:18 PM, Brian Bockelman <[EMAIL PROTECTED]>wrote:
>
>> Hi Thanh,
>>
>> That is correct.  Last time I read the code, Hadoop scheduled the block
>> verifications randomly throughout the period in order to avoid periodic
>> effects (i.e., high load every N minutes).
>>
>> Brian
>>
>> On Oct 13, 2010, at 7:14 PM, Thanh Do wrote:
>>
>>> Brian,
>>>
>>> When you say *attempt* to complete and *entire* node scan,
>>> you mean for example, if a node has 100 block files, it will
>>> try to verify all 100 block every 3 weeks?
>>> That is in average, a block is scanned every (3 weeks / 100 time
>> interval)?
>>>
>>> Thanks
>>> Thanh
>>>
>>>
>>> On Wed, Oct 13, 2010 at 7:07 PM, Brian Bockelman <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> Hi Thanh,
>>>>
>>>> The scan period is the period that hadoop *attempts* to complete an
>> entire
>>>> node scan.  That is, if it's set to 3 weeks, HDFS will try to scan each
>>>> block once every 3 weeks.
>>>>
>>>> Obviously, depending on the bandwidth you have made available to the
>>>> scanning thread, you can specify impossibly small periods.
>>>>
>>>> Brian
>>>>
>>>> On Oct 13, 2010, at 7:01 PM, Thanh Do wrote:
>>>>
>>>>> Hi again,
>>>>>
>>>>> Could any body explain to me about the scanning period
>>>>> policy of DataBlockScanner? That is who often it wake up
>>>>> and scan a block file.
>>>>> When looking at the code, I found
>>>>>
>>>>> static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks
>>>>>
>>>>>
>>>>> but definitely it does not wake up and pick a random block
>>>>> to verify every three weeks, right?
>>>>>
>>>>> Thanks a lot,
>>>>> Thanh
>>>>
>>>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB