|
Hari Shankar
2010-10-29, 06:55
Cosmin Lehene
2010-10-29, 08:05
Michael Segel
2010-10-29, 15:03
Jonathan Gray
2010-10-29, 16:21
Hari Shankar
2010-10-31, 18:29
Hari Shankar
2010-11-02, 12:31
Hari Shankar
2010-11-23, 05:54
Hari Shankar
2010-11-23, 17:23
Stack
2010-11-23, 23:17
|
-
HBase not scaling wellHari Shankar 2010-10-29, 06:55
Hi,
We are currently doing a POC for HBase in our system. We have written a bulk upload job to upload our data from a text file into HBase. We are using a 3-node cluster, one master which also works as slave (running as namenode, jobtracker, HMaster, datanode, tasktracker, HQuorumpeer and HRegionServer) and 2 slaves (datanode, tasktracker, HQuorumpeer and HRegionServer running). The problem is that we are getting lower performance from distributed cluster than what we were getting from single-node pseudo distributed node. The upload is taking about 30 minutes on an individual machine, whereas it is taking 2 hrs on the cluster. We have replication set to 3, so all parts should ideally be available on all nodes, so we doubt if the problem is network latency. scp of files between nodes gives a speed of about 12 MB/s, which I believe should be good enough for this to function. Please correct me if I am wrong here. The nodes are all 4 core machines with 8 GB RAM. We are spawning 4 simultaneous map tasks on each node, and the job does not have any reduce phase. Any help is greatly appreciated. Thanks, Hari Shankar
-
Re: HBase not scaling wellCosmin Lehene 2010-10-29, 08:05
Hi Hari,
Could you do some realtime monitoring (htop, iptraf, iostat) and report the results? Also you could add some timers to the map-reduce operations: measure average operations times to figure out what's taking so long. Cosmin On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote: > Hi, > > We are currently doing a POC for HBase in our system. We have > written a bulk upload job to upload our data from a text file into > HBase. We are using a 3-node cluster, one master which also works as > slave (running as namenode, jobtracker, HMaster, datanode, > tasktracker, HQuorumpeer and HRegionServer) and 2 slaves (datanode, > tasktracker, HQuorumpeer and HRegionServer running). The problem is > that we are getting lower performance from distributed cluster than > what we were getting from single-node pseudo distributed node. The > upload is taking about 30 minutes on an individual machine, whereas > it is taking 2 hrs on the cluster. We have replication set to 3, so > all parts should ideally be available on all nodes, so we doubt if the > problem is network latency. scp of files between nodes gives a speed > of about 12 MB/s, which I believe should be good enough for this to > function. Please correct me if I am wrong here. The nodes are all 4 > core machines with 8 GB RAM. We are spawning 4 simultaneous map tasks > on each node, and the job does not have any reduce phase. Any help is > greatly appreciated. > > Thanks, > Hari Shankar
-
RE: HBase not scaling wellMichael Segel 2010-10-29, 15:03
I'd actually take a step back and ask what Hari is trying to do? Its difficult to figure out what the problem is when the OP says I've got code that works on individual psuedo mode, but not in an actual cluster. It would be nice to know version(s), configuration... 3 nodes... are they running ZK on the same machines that they are running Region Servers... Are they swapping? 8GB of memory can disappear quickly... Lots of questions... > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Date: Fri, 29 Oct 2010 09:05:28 +0100 > Subject: Re: HBase not scaling well > > Hi Hari, > > Could you do some realtime monitoring (htop, iptraf, iostat) and report the results? Also you could add some timers to the map-reduce operations: measure average operations times to figure out what's taking so long. > > Cosmin > On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote: > > > Hi, > > > > We are currently doing a POC for HBase in our system. We have > > written a bulk upload job to upload our data from a text file into > > HBase. We are using a 3-node cluster, one master which also works as > > slave (running as namenode, jobtracker, HMaster, datanode, > > tasktracker, HQuorumpeer and HRegionServer) and 2 slaves (datanode, > > tasktracker, HQuorumpeer and HRegionServer running). The problem is > > that we are getting lower performance from distributed cluster than > > what we were getting from single-node pseudo distributed node. The > > upload is taking about 30 minutes on an individual machine, whereas > > it is taking 2 hrs on the cluster. We have replication set to 3, so > > all parts should ideally be available on all nodes, so we doubt if the > > problem is network latency. scp of files between nodes gives a speed > > of about 12 MB/s, which I believe should be good enough for this to > > function. Please correct me if I am wrong here. The nodes are all 4 > > core machines with 8 GB RAM. We are spawning 4 simultaneous map tasks > > on each node, and the job does not have any reduce phase. Any help is > > greatly appreciated. > > > > Thanks, > > Hari Shankar >
-
RE: HBase not scaling wellJonathan Gray 2010-10-29, 16:21
Going from pseudo-distributed mode to a 3 node setup is definitely not "scaling" in a real way and I would expect performance degradation. Most especially when you're also running at replication factor 3 and in a setup where the master node is also acting as a slave node and MR task node.
You're adding an entirely new layer (HDFS) which will always cause increased latency/decreased throughput, and then you're running on 3 nodes with a replication factor of 3. So now every write is going to all three nodes, via HDFS, rather than a single node straight to the FS. You said that "all parts should ideally be available on all nodes", but this is a write test? So that's a bad thing not a good thing. I would expect about a 50% slowdown but you're seeing more like 75% slowdown. Not so out of the ordinary still. Stuffing a NN, DN, JT, TT, HMaster, and RS onto a single node is not a great idea. And then you're running 4 simultaneous tasks on a 4 core machine (along with these 6 other processes in the case of the master node). How many disks do each of your nodes have? If you really want to "scale" HBase, you're going to need more nodes. I've seen some success at a 5 node level but generally 10 nodes and up is when HBase does well (and replication 3 makes sense). JG > -----Original Message----- > From: Michael Segel [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 29, 2010 8:03 AM > To: [EMAIL PROTECTED] > Subject: RE: HBase not scaling well > > > > I'd actually take a step back and ask what Hari is trying to do? > > Its difficult to figure out what the problem is when the OP says I've > got code that works on individual psuedo mode, but not in an actual > cluster. > It would be nice to know version(s), configuration... 3 nodes... are > they running ZK on the same machines that they are running Region > Servers... Are they swapping? 8GB of memory can disappear quickly... > > Lots of questions... > > > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > Date: Fri, 29 Oct 2010 09:05:28 +0100 > > Subject: Re: HBase not scaling well > > > > Hi Hari, > > > > Could you do some realtime monitoring (htop, iptraf, iostat) and > report the results? Also you could add some timers to the map-reduce > operations: measure average operations times to figure out what's > taking so long. > > > > Cosmin > > On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote: > > > > > Hi, > > > > > > We are currently doing a POC for HBase in our system. We have > > > written a bulk upload job to upload our data from a text file into > > > HBase. We are using a 3-node cluster, one master which also works > as > > > slave (running as namenode, jobtracker, HMaster, datanode, > > > tasktracker, HQuorumpeer and HRegionServer) and 2 slaves > (datanode, > > > tasktracker, HQuorumpeer and HRegionServer running). The problem > is > > > that we are getting lower performance from distributed cluster than > > > what we were getting from single-node pseudo distributed node. The > > > upload is taking about 30 minutes on an individual machine, > whereas > > > it is taking 2 hrs on the cluster. We have replication set to 3, so > > > all parts should ideally be available on all nodes, so we doubt if > the > > > problem is network latency. scp of files between nodes gives a > speed > > > of about 12 MB/s, which I believe should be good enough for this to > > > function. Please correct me if I am wrong here. The nodes are all 4 > > > core machines with 8 GB RAM. We are spawning 4 simultaneous map > tasks > > > on each node, and the job does not have any reduce phase. Any help > is > > > greatly appreciated. > > > > > > Thanks, > > > Hari Shankar > > >
-
Re: HBase not scaling wellHari Shankar 2010-10-31, 18:29
Thanks guys for the replies, and very sorry for the late reply. We are
quite new to linux environment... our production servers are currently running on windows and our linux sysadmin is yet to arrive. So please forgive my ignorance regarding linux tools. Very little prior experience in linux. All our 3 nodes are running on different linux distros - one on ubuntu server 10.10, one on CentOS and one on Ubuntu-desktop 10.04. All have the same directory structure and same versions of hadoop, hbase and java though. Let me know if you think this could be an issue. Basically we wanted to evaluate all three distros at the same time as well. I hope than plan didn't backfire. Back to the problem at hand, here are the iptraf, htop and iostat reports: iptraf snapshot --master Total rates: 165424.2 kbits/s 3800 packets/s Incoming: 109415.3 kbits/s 3007.4 packets/s iptraf snapshot --slave01 Total rates: 102024 kbits/s 3128 packets/s Incoming: 48755.9 kbits/s 1784 packets/s iostat --master Linux 2.6.32-21-generic (hadoop1) Sunday 31 October 2010 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.54 0.01 0.18 0.30 0.00 98.97 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 2.43 123.11 412.93 33988462 114000368 iostat --slave01 Linux 2.6.35-22-server (hadoop2) Sunday 31 October 2010 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.77 0.00 0.29 0.18 0.00 98.77 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 3.90 277.19 1228.97 245515598 1088525808 iostat --slave02 Linux 2.6.18-194.11.1.el5 (slave) 10/31/2010 avg-cpu: %user %nice %system %iowait %steal %idle 0.54 0.00 0.29 0.80 0.00 98.37 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 6.57 302.09 1652.06 321914364 1760497088 sda1 0.00 0.00 0.00 2458 88 sda2 6.57 302.08 1652.06 321911602 1760497000 dm-0 209.33 302.08 1652.06 321910322 1760496272 dm-1 <F5> 0.00 0.00 0.00 896 728 htop --master http://imgur.com/3zTu7 htop --slave01 http://imgur.com/5HeyF htop --slave02 http://imgur.com/lHin7 I hope these are the reports that you were referring to. Please let me otherwise. Also, is there an easier command-line way of fetching the iptraf and htop reports? master is running ubuntu desktop, slave01 runs ubuntu server and slave02 runs CentOS. Some more facts that I have noticed: - I ran the job just now on the cluster after reformatting the namenode and it took only 1 hr 15 mins instead of the usual 2 hrs, though still slower than the single node config (30-40 mins). Can it be that it is faster right after a namenode format? - The time set on one of the slaves was incorrect and it lagged by 4 hrs compared to the other two machines. I corrected the time before formatting the namenode this time. I wonder if that could have an impact. - I have ZK running on all 3 machines. Shouldn't it work fine if I just set up ZK on one of the nodes. In that case, I get a weird error: could not connect to port 0::0::0::0::....:2181 or something of that sort. I'll post the full error next time I see it. - The CentOS machine (slave02) seems to use a lot more CPU than the other 2 guys on an average. CPU usage in centos hovers around 50-60% mostly whereas it is more like 30-40% on the other 2 machines. (ref. htop screenshots above). - One a single-node configuration, moving from a 4 GB-RAM dual core laptop to an 8 GB-quad core machine gives a 1.8x performance improvement. - Increasing the child task heap size from the default 200 MB to 768 MB improved performance on both single and multi node clusters by 100% (2x improvement). But going beyond 768 MB doesn't seem to have much impact. Michael and Jonathan, I think I have covered most of the info you guys had asked for as well above. It doesn't seem to be swapping, and yes, currently we are running all thoise processes on the master, and all processes minus namenode, secondary namenode and JT on the slaves. But we run all those processes on a single machine in case of single node as well, right? So if RAM/Swap was the culprit, shouldn't it effect single-node config more? Do let me know if anything is missing or you think more info would help. Many thanks for your time and patience. :) Thanks, Hari On Fri, Oct 29, 2010 at 9:51 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote:
-
Re: HBase not scaling wellHari Shankar 2010-11-02, 12:31
Hi all,
I ran the TestDFSIO job on my cluster and thought I'd append it here in case it is of any help: 10/11/02 17:53:56 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : write 10/11/02 17:53:56 INFO mapred.FileInputFormat: Date & time: Tue Nov 02 17:53:56 IST 2010 10/11/02 17:53:56 INFO mapred.FileInputFormat: Number of files: 10 10/11/02 17:53:56 INFO mapred.FileInputFormat: Total MBytes processed: 10000 10/11/02 17:53:56 INFO mapred.FileInputFormat: Throughput mb/sec: 1.2372449326777915 10/11/02 17:53:56 INFO mapred.FileInputFormat: Average IO rate mb/sec: 1.2381720542907715 10/11/02 17:53:56 INFO mapred.FileInputFormat: IO rate std deviation: 0.03402313342081011 10/11/02 17:53:56 INFO mapred.FileInputFormat: Test exec time sec: 866.931 10/11/02 17:53:56 INFO mapred.FileInputFormat: 10/11/02 17:59:35 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : read 10/11/02 17:59:35 INFO mapred.FileInputFormat: Date & time: Tue Nov 02 17:59:35 IST 2010 10/11/02 17:59:35 INFO mapred.FileInputFormat: Number of files: 10 10/11/02 17:59:35 INFO mapred.FileInputFormat: Total MBytes processed: 10000 10/11/02 17:59:35 INFO mapred.FileInputFormat: Throughput mb/sec: 22.776708537849196 10/11/02 17:59:35 INFO mapred.FileInputFormat: Average IO rate mb/sec: 28.383480072021484 10/11/02 17:59:35 INFO mapred.FileInputFormat: IO rate std deviation: 12.521607590777203 10/11/02 17:59:35 INFO mapred.FileInputFormat: Test exec time sec: 108.735 10/11/02 17:59:35 INFO mapred.FileInputFormat: For a 3 node cluster, is this good/bad/ugly..? Where can I find data to compare my cluster regarding such parameters? Thanks, Hari On Sun, Oct 31, 2010 at 11:59 PM, Hari Shankar <[EMAIL PROTECTED]> wrote: > Thanks guys for the replies, and very sorry for the late reply. We are > quite new to linux environment... our production servers are currently > running on windows and our linux sysadmin is yet to arrive. So please > forgive my ignorance regarding linux tools. Very little prior > experience in linux. All our 3 nodes are running on different linux > distros - one on ubuntu server 10.10, one on CentOS and one on > Ubuntu-desktop 10.04. All have the same directory structure and same > versions of hadoop, hbase and java though. Let me know if you think > this could be an issue. Basically we wanted to evaluate all three > distros at the same time as well. I hope than plan didn't backfire. > > Back to the problem at hand, here are the iptraf, htop and iostat reports: > > iptraf snapshot --master > > Total rates: > 165424.2 kbits/s > 3800 packets/s > > Incoming: > 109415.3 kbits/s > 3007.4 packets/s > > iptraf snapshot --slave01 > > Total rates: > 102024 kbits/s > 3128 packets/s > > Incoming: > 48755.9 kbits/s > 1784 packets/s > > iostat --master > > Linux 2.6.32-21-generic (hadoop1) Sunday 31 October 2010 _x86_64_ (4 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.54 0.01 0.18 0.30 0.00 98.97 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 2.43 123.11 412.93 33988462 114000368 > > iostat --slave01 > > Linux 2.6.35-22-server (hadoop2) Sunday 31 October 2010 _x86_64_ (4 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.77 0.00 0.29 0.18 0.00 98.77 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 3.90 277.19 1228.97 245515598 1088525808 > > > iostat --slave02 > > Linux 2.6.18-194.11.1.el5 (slave) 10/31/2010 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.54 0.00 0.29 0.80 0.00 98.37 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 6.57 302.09 1652.06 321914364 1760497088 > sda1 0.00 0.00 0.00 2458 88
-
Re: HBase not scaling wellHari Shankar 2010-11-23, 05:54
Hi Jonathan,
You were right, adding one more node really helped. With 4 nodes, the time taken by my job has now finally gone lesser than time taken by single node. My job was BulkUploading of data into HBase from text file using standard API (not bulk upload tool). I had 2.13 GB of data. My cluster details can be found in the above posts. All my nodes are same config (8 GB RAM; 4 core 3.2 GHz; 500 GB single disk; 1 Gbps network). The only change is that I am using CentOS 5.5 now in all nodes and java 1.6.0_22. I'll post the approximate time taken for everyone's reference: Single node --> 35-45 mins 3-node, Replication 1 --> 60-80 mins 3-node, Replication 3 --> 50-60 mins 4-node, Replication 1 --> 23-28 mins! 4-node, Replication 4 --> ~20 mins 6-node, Replication 2 --> ~15 mins I used 4 mappers/tasktracker and 1 reducer. I think that can be tweaked for further improvement. I hope those stats will be useful for other newbies as well. Like Jonathan said above, it seems HBase needs more nodes to really start scaling. Less than 4 nodes actually deteriorates the performance. Thanks for all the help, guys! Thanks and Regards, Hari On Tue, Nov 2, 2010 at 6:01 PM, Hari Shankar <[EMAIL PROTECTED]> wrote: > Hi all, > > I ran the TestDFSIO job on my cluster and thought I'd append it > here in case it is of any help: > > 10/11/02 17:53:56 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : write > 10/11/02 17:53:56 INFO mapred.FileInputFormat: Date & time: > Tue Nov 02 17:53:56 IST 2010 > 10/11/02 17:53:56 INFO mapred.FileInputFormat: Number of files: 10 > 10/11/02 17:53:56 INFO mapred.FileInputFormat: Total MBytes processed: 10000 > 10/11/02 17:53:56 INFO mapred.FileInputFormat: Throughput mb/sec: > 1.2372449326777915 > 10/11/02 17:53:56 INFO mapred.FileInputFormat: Average IO rate mb/sec: > 1.2381720542907715 > 10/11/02 17:53:56 INFO mapred.FileInputFormat: IO rate std deviation: > 0.03402313342081011 > 10/11/02 17:53:56 INFO mapred.FileInputFormat: Test exec time sec: 866.931 > 10/11/02 17:53:56 INFO mapred.FileInputFormat: > > 10/11/02 17:59:35 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : read > 10/11/02 17:59:35 INFO mapred.FileInputFormat: Date & time: > Tue Nov 02 17:59:35 IST 2010 > 10/11/02 17:59:35 INFO mapred.FileInputFormat: Number of files: 10 > 10/11/02 17:59:35 INFO mapred.FileInputFormat: Total MBytes processed: 10000 > 10/11/02 17:59:35 INFO mapred.FileInputFormat: Throughput mb/sec: > 22.776708537849196 > 10/11/02 17:59:35 INFO mapred.FileInputFormat: Average IO rate mb/sec: > 28.383480072021484 > 10/11/02 17:59:35 INFO mapred.FileInputFormat: IO rate std deviation: > 12.521607590777203 > 10/11/02 17:59:35 INFO mapred.FileInputFormat: Test exec time sec: 108.735 > 10/11/02 17:59:35 INFO mapred.FileInputFormat: > > For a 3 node cluster, is this good/bad/ugly..? Where can I find data > to compare my cluster regarding such parameters? > > Thanks, > Hari > > > > On Sun, Oct 31, 2010 at 11:59 PM, Hari Shankar <[EMAIL PROTECTED]> wrote: >> Thanks guys for the replies, and very sorry for the late reply. We are >> quite new to linux environment... our production servers are currently >> running on windows and our linux sysadmin is yet to arrive. So please >> forgive my ignorance regarding linux tools. Very little prior >> experience in linux. All our 3 nodes are running on different linux >> distros - one on ubuntu server 10.10, one on CentOS and one on >> Ubuntu-desktop 10.04. All have the same directory structure and same >> versions of hadoop, hbase and java though. Let me know if you think >> this could be an issue. Basically we wanted to evaluate all three >> distros at the same time as well. I hope than plan didn't backfire. >> >> Back to the problem at hand, here are the iptraf, htop and iostat reports: >> >> iptraf snapshot --master >> >> Total rates: >> 165424.2 kbits/s >> 3800 packets/s >> >> Incoming: >> 109415.3 kbits/s >> 3007.4 packets/s
-
Re: HBase not scaling wellHari Shankar 2010-11-23, 17:23
Hi All,
I think there was some problem with our earlier setup. When I tested the 3-node setup, the three machines had different OSes (Ubuntu server, Ubuntu desktop and CentOS 5.4). Now, while checking the 4-node and 6 nodes, I had all 3 machines with exactly same OS, java version etc. I am not sure if that is what made the difference, but I retried 3-node setup with this setup and surprisingly, I got similar results for the 3 node setup as well. Now I get the job done in 25-30 mins with 3 nodes. (Single node was 35 mins average). So after all, the result is quite reasonable now. In any case, the drastic degradation on performance going from 1 to 3 nodes which I got earlier was quite mysterious. I guess having the same OS helps. Has anyone else faced similar issues/found one OS performing better than another w.r.t HBase? I'll update my earlier data lest anyone gets misled: Single node --> 35-45 mins (Tested on all 3 OSes individually) 3-node, Replication 1 --> 60-80 mins (Multi-OS cluster - Ubuntu server 10.10+Ubuntu Desktop 10.04+CentOS 5.4) 3-node, Replication 2 --> 25-30 mins (Same OS, CentOS 5.5 on all nodes) 3-node, Replication 3 --> 50-60 mins (Multi OS cluster) 4-node, Replication 1 --> 23-28 mins (Same OS) 4-node, Replication 4 --> ~20 mins (Same OS) 6-node, Replication 2 --> ~15 mins (Same OS) Regards, Hari On Tue, Nov 23, 2010 at 11:24 AM, Hari Shankar <[EMAIL PROTECTED]> wrote: > Hi Jonathan, > > You were right, adding one more node really helped. With 4 > nodes, the time taken by my job has now finally gone lesser than time > taken by single node. My job was BulkUploading of data into HBase from > text file using standard API (not bulk upload tool). I had 2.13 GB of > data. My cluster details can be found in the above posts. All my nodes > are same config (8 GB RAM; 4 core 3.2 GHz; 500 GB single disk; 1 Gbps > network). The only change is that I am using CentOS 5.5 now in all > nodes and java 1.6.0_22. I'll post the approximate time taken for > everyone's reference: > > Single node --> 35-45 mins > 3-node, Replication 1 --> 60-80 mins > 3-node, Replication 3 --> 50-60 mins > 4-node, Replication 1 --> 23-28 mins! > 4-node, Replication 4 --> ~20 mins > 6-node, Replication 2 --> ~15 mins > > I used 4 mappers/tasktracker and 1 reducer. I think that can be > tweaked for further improvement. > > I hope those stats will be useful for other newbies as well. Like > Jonathan said above, it seems HBase needs more nodes to really start > scaling. Less than 4 nodes actually deteriorates the performance. > Thanks for all the help, guys! > > Thanks and Regards, > Hari > > On Tue, Nov 2, 2010 at 6:01 PM, Hari Shankar <[EMAIL PROTECTED]> wrote: >> Hi all, >> >> I ran the TestDFSIO job on my cluster and thought I'd append it >> here in case it is of any help: >> >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : write >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Date & time: >> Tue Nov 02 17:53:56 IST 2010 >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Number of files: 10 >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Total MBytes processed: 10000 >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Throughput mb/sec: >> 1.2372449326777915 >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Average IO rate mb/sec: >> 1.2381720542907715 >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: IO rate std deviation: >> 0.03402313342081011 >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Test exec time sec: 866.931 >> 10/11/02 17:53:56 INFO mapred.FileInputFormat: >> >> 10/11/02 17:59:35 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : read >> 10/11/02 17:59:35 INFO mapred.FileInputFormat: Date & time: >> Tue Nov 02 17:59:35 IST 2010 >> 10/11/02 17:59:35 INFO mapred.FileInputFormat: Number of files: 10 >> 10/11/02 17:59:35 INFO mapred.FileInputFormat: Total MBytes processed: 10000 >> 10/11/02 17:59:35 INFO mapred.FileInputFormat: Throughput mb/sec:
-
Re: HBase not scaling wellStack 2010-11-23, 23:17
Thanks for updating the list with your findings Hari. A few of us
were baffled there for a while. St.Ack On Tue, Nov 23, 2010 at 9:23 AM, Hari Shankar <[EMAIL PROTECTED]> wrote: > Hi All, > > I think there was some problem with our earlier setup. When I > tested the 3-node setup, the three machines had different OSes (Ubuntu > server, Ubuntu desktop and CentOS 5.4). Now, while checking the 4-node > and 6 nodes, I had all 3 machines with exactly same OS, java version > etc. I am not sure if that is what made the difference, but I retried > 3-node setup with this setup and surprisingly, I got similar results > for the 3 node setup as well. Now I get the job done in 25-30 mins > with 3 nodes. (Single node was 35 mins average). So after all, the > result is quite reasonable now. In any case, the drastic degradation > on performance going from 1 to 3 nodes which I got earlier was quite > mysterious. I guess having the same OS helps. Has anyone else faced > similar issues/found one OS performing better than another w.r.t > HBase? I'll update my earlier data lest anyone gets misled: > > Single node --> 35-45 mins (Tested on all 3 OSes individually) > 3-node, Replication 1 --> 60-80 mins (Multi-OS cluster - Ubuntu server > 10.10+Ubuntu Desktop 10.04+CentOS 5.4) > 3-node, Replication 2 --> 25-30 mins (Same OS, CentOS 5.5 on all nodes) > 3-node, Replication 3 --> 50-60 mins (Multi OS cluster) > 4-node, Replication 1 --> 23-28 mins (Same OS) > 4-node, Replication 4 --> ~20 mins (Same OS) > 6-node, Replication 2 --> ~15 mins (Same OS) > > Regards, > Hari > > On Tue, Nov 23, 2010 at 11:24 AM, Hari Shankar <[EMAIL PROTECTED]> wrote: >> Hi Jonathan, >> >> You were right, adding one more node really helped. With 4 >> nodes, the time taken by my job has now finally gone lesser than time >> taken by single node. My job was BulkUploading of data into HBase from >> text file using standard API (not bulk upload tool). I had 2.13 GB of >> data. My cluster details can be found in the above posts. All my nodes >> are same config (8 GB RAM; 4 core 3.2 GHz; 500 GB single disk; 1 Gbps >> network). The only change is that I am using CentOS 5.5 now in all >> nodes and java 1.6.0_22. I'll post the approximate time taken for >> everyone's reference: >> >> Single node --> 35-45 mins >> 3-node, Replication 1 --> 60-80 mins >> 3-node, Replication 3 --> 50-60 mins >> 4-node, Replication 1 --> 23-28 mins! >> 4-node, Replication 4 --> ~20 mins >> 6-node, Replication 2 --> ~15 mins >> >> I used 4 mappers/tasktracker and 1 reducer. I think that can be >> tweaked for further improvement. >> >> I hope those stats will be useful for other newbies as well. Like >> Jonathan said above, it seems HBase needs more nodes to really start >> scaling. Less than 4 nodes actually deteriorates the performance. >> Thanks for all the help, guys! >> >> Thanks and Regards, >> Hari >> >> On Tue, Nov 2, 2010 at 6:01 PM, Hari Shankar <[EMAIL PROTECTED]> wrote: >>> Hi all, >>> >>> I ran the TestDFSIO job on my cluster and thought I'd append it >>> here in case it is of any help: >>> >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : write >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Date & time: >>> Tue Nov 02 17:53:56 IST 2010 >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Number of files: 10 >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Total MBytes processed: 10000 >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Throughput mb/sec: >>> 1.2372449326777915 >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Average IO rate mb/sec: >>> 1.2381720542907715 >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: IO rate std deviation: >>> 0.03402313342081011 >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: Test exec time sec: 866.931 >>> 10/11/02 17:53:56 INFO mapred.FileInputFormat: >>> >>> 10/11/02 17:59:35 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : read >>> 10/11/02 17:59:35 INFO mapred.FileInputFormat: Date & time: |