|
|
-
Can we use String.intern inside WritableUtils#readString()?
Bhallamudi Venkata Siva K... 2012-07-12, 13:02
Hi All, I noticed that WritableUtils.readString(), while deserializing the strings, creates a string object every time. But there may be applications, which serialize a small no of the strings, a huge number of times. So while deserializing them, this may lead to OOMs sometimes.
I think using intern() will reduce the creation of the number of String objects. Please correct me if my understading is wrong.
-- Thanks&Regards, Bh.V.S.Kamesh, +91-9652725948
+
Bhallamudi Venkata Siva K... 2012-07-12, 13:02
-
Re: Can we use String.intern inside WritableUtils#readString()?
Ramkumar Vadali 2012-07-12, 18:27
String.intern() should be used with caution. The intern'ed strings go to the "perm gen" space in the java process, which is limited. You could easily run out of that space and get OOM errors even when the total usage is well below the Xmx value. A better way would be to have a Map<String, String> that de-deplicates string objects
Ramkumar
On Thu, Jul 12, 2012 at 6:02 AM, Bhallamudi Venkata Siva Kamesh < [EMAIL PROTECTED]> wrote:
> Hi All, > I noticed that WritableUtils.readString(), while deserializing the > strings, creates a string object every time. But there may be applications, > which serialize a small no of the strings, a huge number of times. So while > deserializing them, this may lead to OOMs sometimes. > > I think using intern() will reduce the creation of the number of String > objects. Please correct me if my understading is wrong. > > -- > Thanks&Regards, > Bh.V.S.Kamesh, > +91-9652725948 > > > > > >
+
Ramkumar Vadali 2012-07-12, 18:27
-
Re: Can we use String.intern inside WritableUtils#readString()?
Robert Evans 2012-07-13, 14:57
Yes I filed a JIRA for something like this a while ago MAPREDUCE-4303. I have not done anything with it for this very reason. There are some potential fixes for this, we could keep a somewhat small weak reference cache of these strings so that if a string is read multiple times it is dedupped and if it is collected we don't force it to stay around too long and it is not placed in the permgen space. But that is not a small change. If you want to take over that JIRA feel free, otherwise I will get around to it eventually.
--Bobby Evans
On 7/12/12 1:27 PM, "Ramkumar Vadali" <[EMAIL PROTECTED]> wrote:
>String.intern() should be used with caution. The intern'ed strings go to >the "perm gen" space in the java process, which is limited. You could >easily run out of that space and get OOM errors even when the total usage >is well below the Xmx value. A better way would be to have a Map<String, >String> that de-deplicates string objects > >Ramkumar > >On Thu, Jul 12, 2012 at 6:02 AM, Bhallamudi Venkata Siva Kamesh < >[EMAIL PROTECTED]> wrote: > >> Hi All, >> I noticed that WritableUtils.readString(), while deserializing the >> strings, creates a string object every time. But there may be >>applications, >> which serialize a small no of the strings, a huge number of times. So >>while >> deserializing them, this may lead to OOMs sometimes. >> >> I think using intern() will reduce the creation of the number of String >> objects. Please correct me if my understading is wrong. >> >> -- >> Thanks&Regards, >> Bh.V.S.Kamesh, >> +91-9652725948 >> >> >> >> >> >>
+
Robert Evans 2012-07-13, 14:57
|
|