Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Is there any way to partially process HDFS edits?


+
Tom Brown 2013-09-25, 16:29
+
Harsh J 2013-09-26, 12:05
+
Tom Brown 2013-09-26, 15:50
+
Nitin Pawar 2013-09-26, 16:25
+
Tom Brown 2013-09-26, 16:37
+
Harsh J 2013-09-26, 17:07
Copy link to this message
-
Re: Is there any way to partially process HDFS edits?
They were created and deleted in quick succession. I thought that meant the
edits for both the create and delete would be logically next to each other
in the file allowing it to release the memory almost as soon as it had been
allocated.

In any case, after finding a VM host that could give me more RAM, I was
able to get the namenode started. The process used 25GB at it's peak.

Thanks for your help!
On Thu, Sep 26, 2013 at 11:07 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Tom,
>
> That is valuable info. When we "replay" edits, we would be creating
> and then deleting those files - so memory would grow in between until
> the delete events begin appearing in the edit log segment.
>
> On Thu, Sep 26, 2013 at 10:07 PM, Tom Brown <[EMAIL PROTECTED]> wrote:
> > A simple estimate puts the total number of blocks somewhere around
> 500,000.
> > Due to an HBase bug (HBASE-9648), there were approximately 50,000,000
> files
> > that were created and quickly deleted (about 10/sec for 6 weeks) in the
> > cluster, and that activity is what is contained in the edits.
> >
> > Since those files don't exist (quickly created and deleted), shouldn't
> they
> > be inconsequential to the memory requirements of the namenode as it
> starts
> > up.
> >
> > --Tom
> >
> >
> > On Thu, Sep 26, 2013 at 10:25 AM, Nitin Pawar <[EMAIL PROTECTED]>
> > wrote:
> >>
> >> Can you share how many blocks does your cluster have? how many
> >> directories? how many files?
> >>
> >> There is a JIRA https://issues.apache.org/jira/browse/HADOOP-1687 which
> >> explains how much RAM will be used for your namenode.
> >> Its pretty old by hadoop version but its a good starting point.
> >>
> >> According to Cloudera's blog "A good rule of thumb is to assume 1GB of
> >> NameNode memory for every 1 million blocks stored in the distributed
> file
> >> system"
> >>
> >>
> http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
> >>
> >>
> >>
> >> On Thu, Sep 26, 2013 at 9:26 PM, Tom Brown <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> It ran again for about 15 hours before dying again. I'm seeing what
> extra
> >>> RAM resources we can throw at this VM (maybe up to 32GB), but until
> then I'm
> >>> trying to figure out if I'm hitting some strange bug.
> >>>
> >>> When the edits were originally made (over the course of 6 weeks), the
> >>> namenode only had 512MB and was able to contain the filesystem
> completely in
> >>> memory. I don't understand why it's running out of memory. If 512MB was
> >>> enough while the edits were first made, shouldn't it be enough to
> process
> >>> them again?
> >>>
> >>> --Tom
> >>>
> >>>
> >>> On Thu, Sep 26, 2013 at 6:05 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>> Hi Tom,
> >>>>
> >>>> The edits are processed sequentially, and aren't all held in memory.
> >>>> Right now there's no mid-way-checkpoint when it is loaded, such that
> >>>> it could resume only with remaining work if interrupted. Normally this
> >>>> is not a problem in deployments given that SNN or SBN runs for
> >>>> checkpointing the images and keeping the edits collection small
> >>>> periodically.
> >>>>
> >>>> If your NameNode is running out of memory _applying_ the edits, then
> >>>> the cause is not the edits but a growing namespace. You most-likely
> >>>> have more files now than before, and thats going to take up permanent
> >>>> memory from the NameNode heap size.
> >>>>
> >>>> On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown <[EMAIL PROTECTED]>
> wrote:
> >>>> > Unfortunately, I cannot give it that much RAM. The machine has 4GB
> >>>> > total
> >>>> > (though could be expanded somewhat-- it's a VM).
> >>>> >
> >>>> > Though if each edit is processed sequentially (in a streaming form),
> >>>> > the
> >>>> > entire edits file will never be in RAM at once.
> >>>> >
> >>>> > Is the edits file format well defined (could I break off 100MB
> chunks
> >>>> > and
> >>>> > process them individually to achieve the same result as processing
+
Jens Scheidtmann 2013-09-29, 09:09
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB