Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Is there any way to partially process HDFS edits?


+
Tom Brown 2013-09-25, 16:29
Copy link to this message
-
Re: Is there any way to partially process HDFS edits?
Hi Tom,

The edits are processed sequentially, and aren't all held in memory.
Right now there's no mid-way-checkpoint when it is loaded, such that
it could resume only with remaining work if interrupted. Normally this
is not a problem in deployments given that SNN or SBN runs for
checkpointing the images and keeping the edits collection small
periodically.

If your NameNode is running out of memory _applying_ the edits, then
the cause is not the edits but a growing namespace. You most-likely
have more files now than before, and thats going to take up permanent
memory from the NameNode heap size.

On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown <[EMAIL PROTECTED]> wrote:
> Unfortunately, I cannot give it that much RAM. The machine has 4GB total
> (though could be expanded somewhat-- it's a VM).
>
> Though if each edit is processed sequentially (in a streaming form), the
> entire edits file will never be in RAM at once.
>
> Is the edits file format well defined (could I break off 100MB chunks and
> process them individually to achieve the same result as processing the whole
> thing at once)?
>
> --Tom
>
>
> On Wed, Sep 25, 2013 at 1:53 PM, Ravi Prakash <[EMAIL PROTECTED]> wrote:
>>
>> Tom! I would guess that just giving the NN JVM lots of memory (64Gb /
>> 96Gb) should be the easiest way.
>>
>>
>> ________________________________
>> From: Tom Brown <[EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Sent: Wednesday, September 25, 2013 11:29 AM
>> Subject: Is there any way to partially process HDFS edits?
>>
>> I have an edits file on my namenode that is 35GB. This is quite a bit
>> larger than it should be (the secondary namenode wasn't running for some
>> time, and HBASE-9648 caused a huge number of additional edits).
>>
>> The first time I tried to start the namenode, it chewed on the edits for
>> about 4 hours and then ran out of memory. I have increased the memory
>> available to the namenode (was 512MB, now 2GB), and started the process
>> again.
>>
>> Is there any way that the edits file can be partially processed to avoid
>> having to re-process the same edits over and over until I can allocate
>> enough memory for it to be done in one shot?
>>
>> How long should it take (hours? days?) to process an edits file of that
>> size?
>>
>> Any help is appreciated!
>>
>> --Tom
>>
>>
>

--
Harsh J
+
Tom Brown 2013-09-26, 15:50
+
Nitin Pawar 2013-09-26, 16:25
+
Tom Brown 2013-09-26, 16:37
+
Harsh J 2013-09-26, 17:07
+
Tom Brown 2013-09-26, 21:42
+
Jens Scheidtmann 2013-09-29, 09:09