Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Is there any way to partially process HDFS edits?


Copy link to this message
-
Re: Is there any way to partially process HDFS edits?
Can you share how many blocks does your cluster have? how many directories?
how many files?

There is a JIRA https://issues.apache.org/jira/browse/HADOOP-1687 which
explains how much RAM will be used for your namenode.
Its pretty old by hadoop version but its a good starting point.

According to Cloudera's blog "A good rule of thumb is to assume 1GB of
NameNode memory for every 1 million blocks stored in the distributed file
system"
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

On Thu, Sep 26, 2013 at 9:26 PM, Tom Brown <[EMAIL PROTECTED]> wrote:

> It ran again for about 15 hours before dying again. I'm seeing what extra
> RAM resources we can throw at this VM (maybe up to 32GB), but until then
> I'm trying to figure out if I'm hitting some strange bug.
>
> When the edits were originally made (over the course of 6 weeks), the
> namenode only had 512MB and was able to contain the filesystem completely
> in memory. I don't understand why it's running out of memory. If 512MB was
> enough while the edits were first made, shouldn't it be enough to process
> them again?
>
> --Tom
>
>
> On Thu, Sep 26, 2013 at 6:05 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Hi Tom,
>>
>> The edits are processed sequentially, and aren't all held in memory.
>> Right now there's no mid-way-checkpoint when it is loaded, such that
>> it could resume only with remaining work if interrupted. Normally this
>> is not a problem in deployments given that SNN or SBN runs for
>> checkpointing the images and keeping the edits collection small
>> periodically.
>>
>> If your NameNode is running out of memory _applying_ the edits, then
>> the cause is not the edits but a growing namespace. You most-likely
>> have more files now than before, and thats going to take up permanent
>> memory from the NameNode heap size.
>>
>> On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown <[EMAIL PROTECTED]> wrote:
>> > Unfortunately, I cannot give it that much RAM. The machine has 4GB total
>> > (though could be expanded somewhat-- it's a VM).
>> >
>> > Though if each edit is processed sequentially (in a streaming form), the
>> > entire edits file will never be in RAM at once.
>> >
>> > Is the edits file format well defined (could I break off 100MB chunks
>> and
>> > process them individually to achieve the same result as processing the
>> whole
>> > thing at once)?
>> >
>> > --Tom
>> >
>> >
>> > On Wed, Sep 25, 2013 at 1:53 PM, Ravi Prakash <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> Tom! I would guess that just giving the NN JVM lots of memory (64Gb /
>> >> 96Gb) should be the easiest way.
>> >>
>> >>
>> >> ________________________________
>> >> From: Tom Brown <[EMAIL PROTECTED]>
>> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> >> Sent: Wednesday, September 25, 2013 11:29 AM
>> >> Subject: Is there any way to partially process HDFS edits?
>> >>
>> >> I have an edits file on my namenode that is 35GB. This is quite a bit
>> >> larger than it should be (the secondary namenode wasn't running for
>> some
>> >> time, and HBASE-9648 caused a huge number of additional edits).
>> >>
>> >> The first time I tried to start the namenode, it chewed on the edits
>> for
>> >> about 4 hours and then ran out of memory. I have increased the memory
>> >> available to the namenode (was 512MB, now 2GB), and started the process
>> >> again.
>> >>
>> >> Is there any way that the edits file can be partially processed to
>> avoid
>> >> having to re-process the same edits over and over until I can allocate
>> >> enough memory for it to be done in one shot?
>> >>
>> >> How long should it take (hours? days?) to process an edits file of that
>> >> size?
>> >>
>> >> Any help is appreciated!
>> >>
>> >> --Tom
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
--
Nitin Pawar