Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Is there any way to partially process HDFS edits?


Copy link to this message
-
Re: Is there any way to partially process HDFS edits?
Can you share how many blocks does your cluster have? how many directories?
how many files?

There is a JIRA https://issues.apache.org/jira/browse/HADOOP-1687 which
explains how much RAM will be used for your namenode.
Its pretty old by hadoop version but its a good starting point.

According to Cloudera's blog "A good rule of thumb is to assume 1GB of
NameNode memory for every 1 million blocks stored in the distributed file
system"
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

On Thu, Sep 26, 2013 at 9:26 PM, Tom Brown <[EMAIL PROTECTED]> wrote:

> It ran again for about 15 hours before dying again. I'm seeing what extra
> RAM resources we can throw at this VM (maybe up to 32GB), but until then
> I'm trying to figure out if I'm hitting some strange bug.
>
> When the edits were originally made (over the course of 6 weeks), the
> namenode only had 512MB and was able to contain the filesystem completely
> in memory. I don't understand why it's running out of memory. If 512MB was
> enough while the edits were first made, shouldn't it be enough to process
> them again?
>
> --Tom
>
>
> On Thu, Sep 26, 2013 at 6:05 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Hi Tom,
>>
>> The edits are processed sequentially, and aren't all held in memory.
>> Right now there's no mid-way-checkpoint when it is loaded, such that
>> it could resume only with remaining work if interrupted. Normally this
>> is not a problem in deployments given that SNN or SBN runs for
>> checkpointing the images and keeping the edits collection small
>> periodically.
>>
>> If your NameNode is running out of memory _applying_ the edits, then
>> the cause is not the edits but a growing namespace. You most-likely
>> have more files now than before, and thats going to take up permanent
>> memory from the NameNode heap size.
>>
>> On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown <[EMAIL PROTECTED]> wrote:
>> > Unfortunately, I cannot give it that much RAM. The machine has 4GB total
>> > (though could be expanded somewhat-- it's a VM).
>> >
>> > Though if each edit is processed sequentially (in a streaming form), the
>> > entire edits file will never be in RAM at once.
>> >
>> > Is the edits file format well defined (could I break off 100MB chunks
>> and
>> > process them individually to achieve the same result as processing the
>> whole
>> > thing at once)?
>> >
>> > --Tom
>> >
>> >
>> > On Wed, Sep 25, 2013 at 1:53 PM, Ravi Prakash <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> Tom! I would guess that just giving the NN JVM lots of memory (64Gb /
>> >> 96Gb) should be the easiest way.
>> >>
>> >>
>> >> ________________________________
>> >> From: Tom Brown <[EMAIL PROTECTED]>
>> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> >> Sent: Wednesday, September 25, 2013 11:29 AM
>> >> Subject: Is there any way to partially process HDFS edits?
>> >>
>> >> I have an edits file on my namenode that is 35GB. This is quite a bit
>> >> larger than it should be (the secondary namenode wasn't running for
>> some
>> >> time, and HBASE-9648 caused a huge number of additional edits).
>> >>
>> >> The first time I tried to start the namenode, it chewed on the edits
>> for
>> >> about 4 hours and then ran out of memory. I have increased the memory
>> >> available to the namenode (was 512MB, now 2GB), and started the process
>> >> again.
>> >>
>> >> Is there any way that the edits file can be partially processed to
>> avoid
>> >> having to re-process the same edits over and over until I can allocate
>> >> enough memory for it to be done in one shot?
>> >>
>> >> How long should it take (hours? days?) to process an edits file of that
>> >> size?
>> >>
>> >> Any help is appreciated!
>> >>
>> >> --Tom
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
--
Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB