|
|
-
making file system block size bigger to improve hdfs performance ?
Jinsong Hu 2011-10-03, 05:05
Hi, There: I just thought an idea. When we format the disk , the block size is usually 1K to 4K. For hdfs, the block size is usually 64M. I wonder if we change the raw file system's block size to something significantly bigger, say, 1M or 8M, will that improve disk IO performance for hadoop's hdfs ? Currently, I noticed that mapr distribution uses mfs, its own file system. That resulted in 4 times performance gain in terms of disk IO. I just wonder if we tune the hosting os parameters, we can achieve better disk IO performance with just the regular apache hadoop distribution. I understand that making the block size bigger can result in some disk space waste for small files. However, for disk dedicated for hdfs, where most of the files are very big, I just wonder if it is a good idea. Any body have any comment ?
Jimmy
-
Re: making file system block size bigger to improve hdfs performance ?
Niels Basjes 2011-10-03, 06:13
Have you tried it to see what diffrence it makes?
-- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 3 okt. 2011 07:06 schreef "Jinsong Hu" <[EMAIL PROTECTED]> het volgende: > Hi, There: > I just thought an idea. When we format the disk , the block size is > usually 1K to 4K. For hdfs, the block size is usually 64M. > I wonder if we change the raw file system's block size to something > significantly bigger, say, 1M or 8M, will that improve > disk IO performance for hadoop's hdfs ? > Currently, I noticed that mapr distribution uses mfs, its own file system.
> That resulted in 4 times performance gain in terms > of disk IO. I just wonder if we tune the hosting os parameters, we can > achieve better disk IO performance with just the regular > apache hadoop distribution. > I understand that making the block size bigger can result in some disk > space waste for small files. However, for disk dedicated > for hdfs, where most of the files are very big, I just wonder if it is a > good idea. Any body have any comment ? > > Jimmy >
-
Re: making file system block size bigger to improve hdfs performance ?
Ted Dunning 2011-10-03, 14:43
The MapR system allocates files with 8K blocks internally, so I doubt that any improvement that you see with a larger block size on HDFS is going to matter much and it could seriously confuse your underlying file system.
The performance advantage for MapR has more to do with a better file system design and much more direct data paths than it has to do with block size on disk. Changing the block size on the HDFS partition isn't going to help that.
On Mon, Oct 3, 2011 at 5:05 AM, Jinsong Hu <[EMAIL PROTECTED]> wrote:
> Hi, There: > I just thought an idea. When we format the disk , the block size is > usually 1K to 4K. For hdfs, the block size is usually 64M. > I wonder if we change the raw file system's block size to something > significantly bigger, say, 1M or 8M, will that improve > disk IO performance for hadoop's hdfs ? > Currently, I noticed that mapr distribution uses mfs, its own file system. > That resulted in 4 times performance gain in terms > of disk IO. I just wonder if we tune the hosting os parameters, we can > achieve better disk IO performance with just the regular > apache hadoop distribution. > I understand that making the block size bigger can result in some disk > space waste for small files. However, for disk dedicated > for hdfs, where most of the files are very big, I just wonder if it is a > good idea. Any body have any comment ? > > Jimmy >
-
Re: making file system block size bigger to improve hdfs performance ?
M. C. Srivas 2011-10-09, 06:01
By default, Linux file systems use a 4K block size. Block size of 4K means all I/O happens 4K at a time. Any *updates* to data smaller than 4K will result in a read-modify-write cycle on disk, ie, if a file was extended from 1K to 2K, the fs will read in the 4K, memcpy the region from 1K-2K into the vm page, then write out 4K again.
If you make the block size 1M, the read-modify-write cycle will read in 1M, and write 1M. I think you don't want that to happen. (imagine Hbase WAL writing a few 100 bytes at a time.)
It also means that on the average, you will waste 512K of disk per file (vs. 2K with a 4K block size).
btw, MapR uses 8K as the native block size on disk.
If you insist on HDFS, try using XFS underneath, it does a much better job than ext3 or ext4 for Hadoop in terms of how data is layed out on disk. But its memory footprint is alteast twice of that of ext3, so it will gobble up a lot more memory on your box.
On Sun, Oct 2, 2011 at 10:05 PM, Jinsong Hu <[EMAIL PROTECTED]> wrote:
> Hi, There: > I just thought an idea. When we format the disk , the block size is > usually 1K to 4K. For hdfs, the block size is usually 64M. > I wonder if we change the raw file system's block size to something > significantly bigger, say, 1M or 8M, will that improve > disk IO performance for hadoop's hdfs ? > Currently, I noticed that mapr distribution uses mfs, its own file system. > That resulted in 4 times performance gain in terms > of disk IO. I just wonder if we tune the hosting os parameters, we can > achieve better disk IO performance with just the regular > apache hadoop distribution. > I understand that making the block size bigger can result in some disk > space waste for small files. However, for disk dedicated > for hdfs, where most of the files are very big, I just wonder if it is a > good idea. Any body have any comment ? > > Jimmy >
-
Re: making file system block size bigger to improve hdfs performance ?
Steve Loughran 2011-10-10, 10:48
On 09/10/11 07:01, M. C. Srivas wrote:
> If you insist on HDFS, try using XFS underneath, it does a much better job > than ext3 or ext4 for Hadoop in terms of how data is layed out on disk. But > its memory footprint is alteast twice of that of ext3, so it will gobble up > a lot more memory on your box.
How stable have you found XFS? I know people have worked a lot on ext4 and I am using it locally, even if something (VirtualBox) tell me off for doing so. I know the Lustre people are using underneath their DFS, and with wide use it does tend to get debugged by others before you use your data.
-
Re: making file system block size bigger to improve hdfs performance ?
M. C. Srivas 2011-10-10, 13:51
XFS was created in 1991 by Silicon Graphics. It was designed for streaming. The Linux port was in 2002 or so.
I've used it extensively for the past 8 years. It is very stable, and many NAS companies have embedded it in their products. In particular, it works well even when the disk starts getting full. ext4 tends to have problems with multiple streams (it seeks too much), and ext3 has a fragmentation problem.
(MapR's disk layout is even better compared to XFS ... couldn't resist) On Mon, Oct 10, 2011 at 3:48 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 09/10/11 07:01, M. C. Srivas wrote: > > If you insist on HDFS, try using XFS underneath, it does a much better job >> than ext3 or ext4 for Hadoop in terms of how data is layed out on disk. >> But >> its memory footprint is alteast twice of that of ext3, so it will gobble >> up >> a lot more memory on your box. >> > > How stable have you found XFS? I know people have worked a lot on ext4 and > I am using it locally, even if something (VirtualBox) tell me off for doing > so. I know the Lustre people are using underneath their DFS, and with wide > use it does tend to get debugged by others before you use your data. >
-
Re: making file system block size bigger to improve hdfs performance ?
Brian Bockelman 2011-10-10, 14:10
I can provide another data point here: xfs works very well in modern Linuxes (in the 2.6.9 era, it had many memory management headaches, especially around the switch to 4k stacks), and its advantage is significant when you run file systems over 95% occupied.
Brian
On Oct 10, 2011, at 8:51 AM, M. C. Srivas wrote:
> XFS was created in 1991 by Silicon Graphics. It was designed for streaming. > The Linux port was in 2002 or so. > > I've used it extensively for the past 8 years. It is very stable, and many > NAS companies have embedded it in their products. In particular, it works > well even when the disk starts getting full. ext4 tends to have problems > with multiple streams (it seeks too much), and ext3 has a fragmentation > problem. > > (MapR's disk layout is even better compared to XFS ... couldn't resist) > > > On Mon, Oct 10, 2011 at 3:48 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > >> On 09/10/11 07:01, M. C. Srivas wrote: >> >> If you insist on HDFS, try using XFS underneath, it does a much better job >>> than ext3 or ext4 for Hadoop in terms of how data is layed out on disk. >>> But >>> its memory footprint is alteast twice of that of ext3, so it will gobble >>> up >>> a lot more memory on your box. >>> >> >> How stable have you found XFS? I know people have worked a lot on ext4 and >> I am using it locally, even if something (VirtualBox) tell me off for doing >> so. I know the Lustre people are using underneath their DFS, and with wide >> use it does tend to get debugged by others before you use your data. >>
|
|