|
Ferdy Galema
2011-05-05, 20:45
ShengChang Gu
2011-05-06, 02:00
Ferdy Galema
2011-05-06, 07:44
Rita
2011-05-06, 22:04
Eric
2011-05-09, 09:33
Jonathan Disher
2011-05-10, 00:07
Will Maier
2011-05-10, 01:22
Rita
2011-05-10, 04:03
Jonathan Disher
2011-05-10, 06:46
Will Maier
2011-05-10, 10:30
Rita
2011-05-10, 10:59
Jonathan Disher
2011-05-10, 11:26
Marcos Ortiz
2011-05-10, 13:06
Marcos Ortiz
2011-05-10, 13:14
Allen Wittenauer
2011-05-10, 20:57
Allen Wittenauer
2011-05-10, 20:59
Jonathan Disher
2011-05-10, 21:24
|
-
our experiences with various filesystems and tuning optionsFerdy Galema 2011-05-05, 20:45
Hi,
We've performed tests for ext3 and xfs filesystems using different settings. The results might be useful for anyone else. The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit ethernet, [EMAIL PROTECTED]z quadcores and 4x1TB disks. The disk read speeds vary from about 90 to 130MB/s. (Tested using hdparm -t). Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node) OS: Linux version 2.6.18-238.5.1.el5 ([EMAIL PROTECTED]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #our command for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i /rand-copy$i; done Our benchmark consists of a standard random-writer job followed by a distcp of the same data, both using a replication of 1. This is to make sure only the disks get hit. Each benchmark is ran several times for every configuration. Because of the occasional hickup, I will list both the average and the fastest times for each configuration. I read the execution times off the jobtracker. The configurations (with exection times in seconds of Avg-writer / Min-writer / Avg-distcp / Min-distcp) ext3-default 158 / 136 / 411 / 343 ext3-tuned 159 / 132 / 330 / 297 ra1024 ext3-tuned 159 / 132 / 292 / 264 ra1024 xfs-tuned 128 / 122 / 220 / 202 To explain, ext3-tuned is with tuned mount options [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer of 1024 blocks. The xfs disks are created using mkfs options [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8]. In conclusion it seems that using tuned xfs filesystems combined with increased read-ahead buffers increased our basic hdfs performance with about 10% (random-writer) to 40% (distcp). Hopefully this is useful to anyone. Although I won't be performing more tests soon I'd be happy to provide more details. Ferdy.
-
Re: our experiences with various filesystems and tuning optionsShengChang Gu 2011-05-06, 02:00
Many thanks.
We use xfs all the time.Have you try the ext4 filesystem? 2011/5/6 Ferdy Galema <[EMAIL PROTECTED]> > Hi, > > We've performed tests for ext3 and xfs filesystems using different > settings. The results might be useful for anyone else. > > The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit > ethernet, [EMAIL PROTECTED]z quadcores and 4x1TB disks. The disk read speeds > vary from about 90 to 130MB/s. (Tested using hdparm -t). > > Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node) > OS: Linux version 2.6.18-238.5.1.el5 ([EMAIL PROTECTED]) (gcc > version 4.1.2 20080704 (Red Hat 4.1.2-50)) > > #our command > for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar > randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs > /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i > /rand-copy$i; done > > Our benchmark consists of a standard random-writer job followed by a distcp > of the same data, both using a replication of 1. This is to make sure only > the disks get hit. Each benchmark is ran several times for every > configuration. Because of the occasional hickup, I will list both the > average and the fastest times for each configuration. I read the execution > times off the jobtracker. > > The configurations (with exection times in seconds of Avg-writer / > Min-writer / Avg-distcp / Min-distcp) > ext3-default 158 / 136 / 411 / 343 > ext3-tuned 159 / 132 / 330 / 297 > ra1024 ext3-tuned 159 / 132 / 292 / 264 > ra1024 xfs-tuned 128 / 122 / 220 / 202 > > To explain, ext3-tuned is with tuned mount options > [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer > of 1024 blocks. The xfs disks are created using mkfs options > [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8]. > > In conclusion it seems that using tuned xfs filesystems combined with > increased read-ahead buffers increased our basic hdfs performance with about > 10% (random-writer) to 40% (distcp). > > Hopefully this is useful to anyone. Although I won't be performing more > tests soon I'd be happy to provide more details. > Ferdy. > -- 阿昌
-
Re: our experiences with various filesystems and tuning optionsFerdy Galema 2011-05-06, 07:44
No unfortunately not, we couldn't because of our kernel versions.
On 05/06/2011 04:00 AM, ShengChang Gu wrote: > Many thanks. > > We use xfs all the time.Have you try the ext4 filesystem? > > 2011/5/6 Ferdy Galema <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> > > Hi, > > We've performed tests for ext3 and xfs filesystems using different > settings. The results might be useful for anyone else. > > The datanode cluster consists of 15 slave nodes, each equipped > with 1Gbit ethernet, [EMAIL PROTECTED]z quadcores and 4x1TB disks. The > disk read speeds vary from about 90 to 130MB/s. (Tested using > hdparm -t). > > Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node) > OS: Linux version 2.6.18-238.5.1.el5 > ([EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>) (gcc version 4.1.2 > 20080704 (Red Hat 4.1.2-50)) > > #our command > for i in `seq 1 10`; do ./hadoop jar > ../hadoop-examples-0.20.2-cdh3u0.jar randomwriter > -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs > /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i > /rand-copy$i; done > > Our benchmark consists of a standard random-writer job followed by > a distcp of the same data, both using a replication of 1. This is > to make sure only the disks get hit. Each benchmark is ran several > times for every configuration. Because of the occasional hickup, I > will list both the average and the fastest times for each > configuration. I read the execution times off the jobtracker. > > The configurations (with exection times in seconds of Avg-writer / > Min-writer / Avg-distcp / Min-distcp) > ext3-default 158 / 136 / 411 / 343 > ext3-tuned 159 / 132 / 330 / 297 > ra1024 ext3-tuned 159 / 132 / 292 / 264 > ra1024 xfs-tuned 128 / 122 / 220 / 202 > > To explain, ext3-tuned is with tuned mount options > [noatime,nodiratime,data=writeback,rw] and ra1024 means a > read-ahead buffer of 1024 blocks. The xfs disks are created using > mkfs options [size=128m,lazy-count=1] and mount options > [noatime,nodiratime,logbufs=8]. > > In conclusion it seems that using tuned xfs filesystems combined > with increased read-ahead buffers increased our basic hdfs > performance with about 10% (random-writer) to 40% (distcp). > > Hopefully this is useful to anyone. Although I won't be performing > more tests soon I'd be happy to provide more details. > Ferdy. > > > > > -- > 锟斤拷锟斤拷
-
Re: our experiences with various filesystems and tuning optionsRita 2011-05-06, 22:04
Sheng,
How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick it up. 2011/5/6 Ferdy Galema <[EMAIL PROTECTED]> > No unfortunately not, we couldn't because of our kernel versions. > > > On 05/06/2011 04:00 AM, ShengChang Gu wrote: > > Many thanks. > > We use xfs all the time.Have you try the ext4 filesystem? > > 2011/5/6 Ferdy Galema <[EMAIL PROTECTED]> > >> Hi, >> >> We've performed tests for ext3 and xfs filesystems using different >> settings. The results might be useful for anyone else. >> >> The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit >> ethernet, [EMAIL PROTECTED]z quadcores and 4x1TB disks. The disk read speeds >> vary from about 90 to 130MB/s. (Tested using hdparm -t). >> >> Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node) >> OS: Linux version 2.6.18-238.5.1.el5 ([EMAIL PROTECTED]) >> (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) >> >> #our command >> for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar >> randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs >> /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i >> /rand-copy$i; done >> >> Our benchmark consists of a standard random-writer job followed by a >> distcp of the same data, both using a replication of 1. This is to make sure >> only the disks get hit. Each benchmark is ran several times for every >> configuration. Because of the occasional hickup, I will list both the >> average and the fastest times for each configuration. I read the execution >> times off the jobtracker. >> >> The configurations (with exection times in seconds of Avg-writer / >> Min-writer / Avg-distcp / Min-distcp) >> ext3-default 158 / 136 / 411 / 343 >> ext3-tuned 159 / 132 / 330 / 297 >> ra1024 ext3-tuned 159 / 132 / 292 / 264 >> ra1024 xfs-tuned 128 / 122 / 220 / 202 >> >> To explain, ext3-tuned is with tuned mount options >> [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer >> of 1024 blocks. The xfs disks are created using mkfs options >> [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8]. >> >> In conclusion it seems that using tuned xfs filesystems combined with >> increased read-ahead buffers increased our basic hdfs performance with about >> 10% (random-writer) to 40% (distcp). >> >> Hopefully this is useful to anyone. Although I won't be performing more >> tests soon I'd be happy to provide more details. >> Ferdy. >> > > > > -- > 阿昌 > > -- --- Get your facts first, then you can distort them as you please.--
-
Re: our experiences with various filesystems and tuning optionsEric 2011-05-09, 09:33
Just a small warning: I've seen kernel panics with the XFS kernel module
once you have many disks (in my case: > 20 disks). This is an exotic amount of disks to put in one server so it shouldn't hold anyone back from using XFS :-) 2011/5/7 Rita <[EMAIL PROTECTED]> > Sheng, > > How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick > it up. > > > 2011/5/6 Ferdy Galema <[EMAIL PROTECTED]> > >> No unfortunately not, we couldn't because of our kernel versions. >> >> >> On 05/06/2011 04:00 AM, ShengChang Gu wrote: >> >> Many thanks. >> >> We use xfs all the time.Have you try the ext4 filesystem? >> >> 2011/5/6 Ferdy Galema <[EMAIL PROTECTED]> >> >>> Hi, >>> >>> We've performed tests for ext3 and xfs filesystems using different >>> settings. The results might be useful for anyone else. >>> >>> The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit >>> ethernet, [EMAIL PROTECTED]z quadcores and 4x1TB disks. The disk read speeds >>> vary from about 90 to 130MB/s. (Tested using hdparm -t). >>> >>> Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node) >>> OS: Linux version 2.6.18-238.5.1.el5 ([EMAIL PROTECTED]) >>> (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) >>> >>> #our command >>> for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar >>> randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs >>> /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i >>> /rand-copy$i; done >>> >>> Our benchmark consists of a standard random-writer job followed by a >>> distcp of the same data, both using a replication of 1. This is to make sure >>> only the disks get hit. Each benchmark is ran several times for every >>> configuration. Because of the occasional hickup, I will list both the >>> average and the fastest times for each configuration. I read the execution >>> times off the jobtracker. >>> >>> The configurations (with exection times in seconds of Avg-writer / >>> Min-writer / Avg-distcp / Min-distcp) >>> ext3-default 158 / 136 / 411 / 343 >>> ext3-tuned 159 / 132 / 330 / 297 >>> ra1024 ext3-tuned 159 / 132 / 292 / 264 >>> ra1024 xfs-tuned 128 / 122 / 220 / 202 >>> >>> To explain, ext3-tuned is with tuned mount options >>> [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer >>> of 1024 blocks. The xfs disks are created using mkfs options >>> [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8]. >>> >>> In conclusion it seems that using tuned xfs filesystems combined with >>> increased read-ahead buffers increased our basic hdfs performance with about >>> 10% (random-writer) to 40% (distcp). >>> >>> Hopefully this is useful to anyone. Although I won't be performing more >>> tests soon I'd be happy to provide more details. >>> Ferdy. >>> >> >> >> >> -- >> 阿昌 >> >> > > > -- > --- Get your facts first, then you can distort them as you please.-- >
-
Re: our experiences with various filesystems and tuning optionsJonathan Disher 2011-05-10, 00:07
Speak for yourself, I just built a bunch of 36 disk datanodes :)
-j On May 9, 2011, at 2:33 AM, Eric wrote: > Just a small warning: I've seen kernel panics with the XFS kernel module once you have many disks (in my case: > 20 disks). This is an exotic amount of disks to put in one server so it shouldn't hold anyone back from using XFS :-) > > 2011/5/7 Rita <[EMAIL PROTECTED]> > Sheng, > > How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick it up. > > > 2011/5/6 Ferdy Galema <[EMAIL PROTECTED]> > No unfortunately not, we couldn't because of our kernel versions. > > > On 05/06/2011 04:00 AM, ShengChang Gu wrote: >> >> Many thanks. >> >> We use xfs all the time.Have you try the ext4 filesystem? >> >> 2011/5/6 Ferdy Galema <[EMAIL PROTECTED]> >> Hi, >> >> We've performed tests for ext3 and xfs filesystems using different settings. The results might be useful for anyone else. >> >> The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit ethernet, [EMAIL PROTECTED]z quadcores and 4x1TB disks. The disk read speeds vary from about 90 to 130MB/s. (Tested using hdparm -t). >> >> Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node) >> OS: Linux version 2.6.18-238.5.1.el5 ([EMAIL PROTECTED]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) >> >> #our command >> for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i /rand-copy$i; done >> >> Our benchmark consists of a standard random-writer job followed by a distcp of the same data, both using a replication of 1. This is to make sure only the disks get hit. Each benchmark is ran several times for every configuration. Because of the occasional hickup, I will list both the average and the fastest times for each configuration. I read the execution times off the jobtracker. >> >> The configurations (with exection times in seconds of Avg-writer / Min-writer / Avg-distcp / Min-distcp) >> ext3-default 158 / 136 / 411 / 343 >> ext3-tuned 159 / 132 / 330 / 297 >> ra1024 ext3-tuned 159 / 132 / 292 / 264 >> ra1024 xfs-tuned 128 / 122 / 220 / 202 >> >> To explain, ext3-tuned is with tuned mount options [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer of 1024 blocks. The xfs disks are created using mkfs options [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8]. >> >> In conclusion it seems that using tuned xfs filesystems combined with increased read-ahead buffers increased our basic hdfs performance with about 10% (random-writer) to 40% (distcp). >> >> Hopefully this is useful to anyone. Although I won't be performing more tests soon I'd be happy to provide more details. >> Ferdy. >> >> >> >> -- >> 阿昌 > > > > -- > --- Get your facts first, then you can distort them as you please.-- >
-
Re: our experiences with various filesystems and tuning optionsWill Maier 2011-05-10, 01:22
On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote:
> Speak for yourself, I just built a bunch of 36 disk datanodes :) And I just unboxed 10 more 36 disk systems to join the two already in our cluster. We also have 20 systems with 24 disks, though most of our datanodes are have more typical four disks... -- Will Maier - UW High Energy Physics cel: 608.438.6162 tel: 608.263.9692 web: http://www.hep.wisc.edu/~wcmaier/
-
Re: our experiences with various filesystems and tuning optionsRita 2011-05-10, 04:03
what filesystem are they using and what is the size of each filesystem?
On Mon, May 9, 2011 at 9:22 PM, Will Maier <[EMAIL PROTECTED]> wrote: > On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote: > > Speak for yourself, I just built a bunch of 36 disk datanodes :) > > And I just unboxed 10 more 36 disk systems to join the two already in our > cluster. We also have 20 systems with 24 disks, though most of our > datanodes are > have more typical four disks... > > -- > > Will Maier - UW High Energy Physics > cel: 608.438.6162 > tel: 608.263.9692 > web: http://www.hep.wisc.edu/~wcmaier/ > -- --- Get your facts first, then you can distort them as you please.--
-
Re: our experiences with various filesystems and tuning optionsJonathan Disher 2011-05-10, 06:46
I cant speak for Will, but I'm actually going against recommendations, my systems have three 20TB RAID 6 arrays, with two 10TB ext4 filesystems per array.
The problems you will encounter keeping machines performing well after they get internally unbalanced following disk failures and replacements (and keeping machines online with non-standard configs, missing disks, etc) will drive you nuts. It drives me nuts. -j On May 9, 2011, at 9:03 PM, Rita wrote: > what filesystem are they using and what is the size of each filesystem? > > > On Mon, May 9, 2011 at 9:22 PM, Will Maier <[EMAIL PROTECTED]> wrote: > On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote: > > Speak for yourself, I just built a bunch of 36 disk datanodes :) > > And I just unboxed 10 more 36 disk systems to join the two already in our > cluster. We also have 20 systems with 24 disks, though most of our datanodes are > have more typical four disks... > > -- > > Will Maier - UW High Energy Physics > cel: 608.438.6162 > tel: 608.263.9692 > web: http://www.hep.wisc.edu/~wcmaier/ > > > > -- > --- Get your facts first, then you can distort them as you please.--
-
Re: our experiences with various filesystems and tuning optionsWill Maier 2011-05-10, 10:30
On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote:
> what filesystem are they using and what is the size of each filesystem? It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching to the deadline IO scheduler, we haven't done much tuning/tweaking. A script runs every ten minutes to test all of the data mounts and reconfigure hdfs-site.xml and restart the datanode if necessary. So far, this approach has allowed us to avoid loss of space to RAID without correlating the risk of disk failure by building larger RAID0s. In the future, we expect to deprecate the script and rely on the datanode process itself to handle missing/failing disks. -- Will Maier - UW High Energy Physics cel: 608.438.6162 tel: 608.263.9692 web: http://www.hep.wisc.edu/~wcmaier/
-
Re: our experiences with various filesystems and tuning optionsRita 2011-05-10, 10:59
I keep asking because I wasn't able to use a XFS filesystem larger than
3-4TB. If the XFS file system is larger than 4TB hdfs won't recognize the space. I am on a 64bit RHEL 5.3 host. On Tue, May 10, 2011 at 6:30 AM, Will Maier <[EMAIL PROTECTED]> wrote: > On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote: > > what filesystem are they using and what is the size of each filesystem? > > It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching > to > the deadline IO scheduler, we haven't done much tuning/tweaking. A script > runs > every ten minutes to test all of the data mounts and reconfigure > hdfs-site.xml > and restart the datanode if necessary. So far, this approach has allowed us > to > avoid loss of space to RAID without correlating the risk of disk failure by > building larger RAID0s. > > In the future, we expect to deprecate the script and rely on the datanode > process > itself to handle missing/failing disks. > > -- > > Will Maier - UW High Energy Physics > cel: 608.438.6162 > tel: 608.263.9692 > web: http://www.hep.wisc.edu/~wcmaier/ > -- --- Get your facts first, then you can distort them as you please.--
-
Re: our experiences with various filesystems and tuning optionsJonathan Disher 2011-05-10, 11:26
In a previous life, I've had extreme problems with XFS, including kernel panics and data loss under high load.
Those were database servers, not Hadoop nodes, and it was a few years ago. But, ext3/ext4 seems to be stable enough, and it's more widely supported, so it's my preference. -j On May 10, 2011, at 3:59 AM, Rita wrote: > I keep asking because I wasn't able to use a XFS filesystem larger than 3-4TB. If the XFS file system is larger than 4TB hdfs won't recognize the space. I am on a 64bit RHEL 5.3 host. > > > On Tue, May 10, 2011 at 6:30 AM, Will Maier <[EMAIL PROTECTED]> wrote: > On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote: > > what filesystem are they using and what is the size of each filesystem? > > It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching to > the deadline IO scheduler, we haven't done much tuning/tweaking. A script runs > every ten minutes to test all of the data mounts and reconfigure hdfs-site.xml > and restart the datanode if necessary. So far, this approach has allowed us to > avoid loss of space to RAID without correlating the risk of disk failure by > building larger RAID0s. > > In the future, we expect to deprecate the script and rely on the datanode process > itself to handle missing/failing disks. > > -- > > Will Maier - UW High Energy Physics > cel: 608.438.6162 > tel: 608.263.9692 > web: http://www.hep.wisc.edu/~wcmaier/ > > > > -- > --- Get your facts first, then you can distort them as you please.--
-
Re: our experiences with various filesystems and tuning optionsMarcos Ortiz 2011-05-10, 13:06
On 05/10/2011 06:29 AM, Rita wrote:
> I keep asking because I wasn't able to use a XFS filesystem larger > than 3-4TB. If the XFS file system is larger than 4TB hdfs won't > recognize the space. I am on a 64bit RHEL 5.3 host. > > > On Tue, May 10, 2011 at 6:30 AM, Will Maier <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote: > > what filesystem are they using and what is the size of each > filesystem? > > It sounds nuts, but each disk has its own ext3 filesystem. Beyond > switching to > the deadline IO scheduler, we haven't done much tuning/tweaking. A > script runs > every ten minutes to test all of the data mounts and reconfigure > hdfs-site.xml > and restart the datanode if necessary. So far, this approach has > allowed us to > avoid loss of space to RAID without correlating the risk of disk > failure by > building larger RAID0s. > > In the future, we expect to deprecate the script and rely on the > datanode process > itself to handle missing/failing disks. > > -- > > Will Maier - UW High Energy Physics > cel: 608.438.6162 <tel:608.438.6162> > tel: 608.263.9692 <tel:608.263.9692> > web: http://www.hep.wisc.edu/~wcmaier/ > <http://www.hep.wisc.edu/%7Ewcmaier/> > > > > > -- > --- Get your facts first, then you can distort them as you please.-- I saw this problem before with 64 bits version of Red Hat EL 5.3. Which is the kernel version that you are using? Can you upgrade the system to 5.5 or to 6.0? There are a lot of bugs corrections and performance gaining with these releases. Another issue is that since the 5.4 vesion, Red Hat added preliminary XFS support specifically to address the need for filesystem more large, and their RHEL 6 release treats it as a fully supported filesystem on par with ext3 and ext4. One last issue: XFS can handle files greather than 16 TB. The primary problem is the tools to read and write those files. (ext4 virtually too can handle this huge files, but the problems is on the mkfs utility that is not optimized for this) Regards -- Marcos Lu�s Ort�z Valmaseda Software Engineer (Large-Scaled Distributed Systems) University of Information Sciences, La Habana, Cuba Linux User # 418229 http://about.me/marcosortiz
-
Re: our experiences with various filesystems and tuning optionsMarcos Ortiz 2011-05-10, 13:14
On 05/10/2011 06:56 AM, Jonathan Disher wrote:
> In a previous life, I've had extreme problems with XFS, including > kernel panics and data loss under high load. > > Those were database servers, not Hadoop nodes, and it was a few years > ago. But, ext3/ext4 seems to be stable enough, and it's more widely > supported, so it's my preference. > > -j > > On May 10, 2011, at 3:59 AM, Rita wrote: > >> I keep asking because I wasn't able to use a XFS filesystem larger >> than 3-4TB. If the XFS file system is larger than 4TB hdfs won't >> recognize the space. I am on a 64bit RHEL 5.3 host. >> >> >> On Tue, May 10, 2011 at 6:30 AM, Will Maier <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote: >> > what filesystem are they using and what is the size of each >> filesystem? >> >> It sounds nuts, but each disk has its own ext3 filesystem. Beyond >> switching to >> the deadline IO scheduler, we haven't done much tuning/tweaking. >> A script runs >> every ten minutes to test all of the data mounts and reconfigure >> hdfs-site.xml >> and restart the datanode if necessary. So far, this approach has >> allowed us to >> avoid loss of space to RAID without correlating the risk of disk >> failure by >> building larger RAID0s. >> >> In the future, we expect to deprecate the script and rely on the >> datanode process >> itself to handle missing/failing disks. >> >> -- >> >> Will Maier - UW High Energy Physics >> cel: 608.438.6162 <tel:608.438.6162> >> tel: 608.263.9692 <tel:608.263.9692> >> web: http://www.hep.wisc.edu/~wcmaier/ >> <http://www.hep.wisc.edu/%7Ewcmaier/> >> >> >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- > Jonathan, I had the same issues on my PostgreSQL servers, and the main issues was given by the kernel version that I was using. I upgrade the kernel to the last version supported by Red Hat, and everything worked OK. My prefered filesystem is ZFS, It's a shame that Linux support is very inmature yet. For that reason, I changed my PostgreSQL hosts to FreeBSD-8.0 to use ZFS like filesystem and it's really rocks. Had anyone tested a Hadoop cluster with this filesystem? On Solaris or FreeBSD? Regards -- Marcos Lu�s Ort�z Valmaseda Software Engineer (Large-Scaled Distributed Systems) University of Information Sciences, La Habana, Cuba Linux User # 418229 http://about.me/marcosortiz
-
Re: our experiences with various filesystems and tuning optionsAllen Wittenauer 2011-05-10, 20:57
On May 9, 2011, at 11:46 PM, Jonathan Disher wrote: > I cant speak for Will, but I'm actually going against recommendations, my systems have three 20TB RAID 6 arrays, with two 10TB ext4 filesystems per array. > > The problems you will encounter keeping machines performing well after they get internally unbalanced following disk failures and replacements (and keeping machines online with non-standard configs, missing disks, etc) will drive you nuts. It drives me nuts. This sounds more like you just don't have enough nodes if you are that concerned about single machine performance. :)
-
Re: our experiences with various filesystems and tuning optionsAllen Wittenauer 2011-05-10, 20:59
On May 10, 2011, at 6:14 AM, Marcos Ortiz wrote: > My prefered filesystem is ZFS, It's a shame that Linux support is very inmature yet. For that reason, I changed my PostgreSQL hosts to FreeBSD-8.0 to use > ZFS like filesystem and it's really rocks. > > Had anyone tested a Hadoop cluster with this filesystem? > On Solaris or FreeBSD? HDFS capacity numbers go really wonky on pooled storage systems like ZFS. Other than that, performance is more than acceptable vs. ext4. [Sorry, I don't have my benchmark numbers handy.]
-
Re: our experiences with various filesystems and tuning optionsJonathan Disher 2011-05-10, 21:24
This cluster is specifically a near-line archive cluster, so storage density is more important than computational performance. Our primary production cluster (which actually does very little in the way of computation) is comprised of Dell R510's with 10 disks in JBOD and a two disk mirrored OS drive. 48 of those makes a nice speedy cluster.
-j On May 10, 2011, at 1:57 PM, Allen Wittenauer wrote: > > On May 9, 2011, at 11:46 PM, Jonathan Disher wrote: > >> I cant speak for Will, but I'm actually going against recommendations, my systems have three 20TB RAID 6 arrays, with two 10TB ext4 filesystems per array. >> >> The problems you will encounter keeping machines performing well after they get internally unbalanced following disk failures and replacements (and keeping machines online with non-standard configs, missing disks, etc) will drive you nuts. It drives me nuts. > > This sounds more like you just don't have enough nodes if you are that concerned about single machine performance. :) > |