Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> High IO Usage in Datanodes due to Replication


Copy link to this message
-
Re: High IO Usage in Datanodes due to Replication
If your partitions are only storing about 1 MB in each, I don't know
if its a good key design or a good application for Hadoop. But if you
mean that there are many files under a single partition, but all of
them being 1 MB or so each, then you can safely merge them without
issues.

HDFS-2379 should be available in Apache Hadoop 1.0.1+ and in more
recent CDH3 releases, and isn't a problem in Apache Hadoop 2.x. Its
symptom would be frequent dead node listing of DataNodes, but no
actual DN crashes.

If your DN is crashing, rather, or periodically slowing down
generally, the issue may simply be that of heap. You can run tools to
monitor if your heap is filling up/invoking GC too frequently/etc.

On Wed, May 1, 2013 at 3:39 PM, selva <[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> You are right, Our Hadoop version is "0.20.2-cdh3u1" which is lack of
> HDFS-2379.
>
> As you suggest i have doubled the DN heap size, Now i will monitor the Block
> scanning speed.
>
> The 2nd idea is good, but I can not merge the small files(~1 MB) since its
> all in hive table partitions.
>
> -Selva
>
>
> On Wed, May 1, 2013 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> Neither block reports nor block scanning should affect general DN I/O,
>> although the former may affect DN liveliness in older versions, if
>> they lack HDFS-2379 in them. Brahma is partially right in having
>> mentioned the block reports, hence.
>>
>> Your solution, if the # of blocks per DN is too high (counts available
>> on Live Nodes page in NN UI), say > 1m or so blocks, is to simply
>> raise the DN heap by another GB to fix issues immediately, and then
>> start working on merging small files together for more efficient
>> processing and reducing overall block count to lower memory pressure
>> at the DNs.
>>
>>
>>
>> On Wed, May 1, 2013 at 12:02 PM, selva <[EMAIL PROTECTED]> wrote:
>> > Thanks a lot Harsh. Your input is really valuable for me.
>> >
>> > As you mentioned above, we have overload of many small files in our
>> > cluster.
>> >
>> > Also when i load data huge data to hive tables, It throws an exception
>> > like
>> > "replicated to to 0 nodes instead of 1". When i google it i found one of
>> > the
>> > reason matches my case  "Data Node is Busy with block report and block
>> > scanning" @ http://bit.ly/ZToyNi
>> >
>> > Is increasing the Block scanning and scanning all inefficient small
>> > files
>> > will fix my problem ?
>> >
>> > Thanks
>> > Selva
>> >
>> >
>> > On Wed, May 1, 2013 at 11:37 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> >>
>> >> The block scanner is a simple, independent operation of the DN that
>> >> runs periodically and does work in small phases, to ensure that no
>> >> blocks exist that aren't matching their checksums (its an automatic
>> >> data validator) - such that it may report corrupt/rotting blocks and
>> >> keep the cluster healthy.
>> >>
>> >> Its runtime shouldn't cause any issues, unless your DN has a lot of
>> >> blocks (more than normal due to overload of small, inefficient files)
>> >> but too little heap size to perform retention plus block scanning.
>> >>
>> >> > 1. Is data node will not allow to write the data during
>> >> > DataBlockScanning process ?
>> >>
>> >> No such thing. As I said, its independent and mostly lock free. Writes
>> >> or reads are not hampered.
>> >>
>> >> > 2. Is data node will come normal only when "Not yet verified" come to
>> >> > zero in data node blockScannerReport ?
>> >>
>> >> Yes, but note that this runs over and over again (once every 3 weeks
>> >> IIRC).
>> >>
>> >> On Wed, May 1, 2013 at 11:33 AM, selva <[EMAIL PROTECTED]> wrote:
>> >> > Thanks Harsh & Manoj for the inputs.
>> >> >
>> >> > Now i found that the data node is busy with block scanning. I have
>> >> > TBs
>> >> > data
>> >> > attached with each data node. So its taking days to complete the data
>> >> > block
>> >> > scanning. I have two questions.
>> >> >
>> >> > 1. Is data node will not allow to write the data during

Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB