Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Performance test results


Copy link to this message
-
Re: Performance test results
Eran Kutner 2011-04-27, 15:31
Since the attachment didn't make it, here it is again:
http://shortText.com/jp73moaesx
-eran
On Wed, Apr 27, 2011 at 16:51, Eran Kutner <[EMAIL PROTECTED]> wrote:
> Hi Josh,
>
> The connection pooling code is attached AS IS (with all the usual legal
> disclaimers), note that you will have to modify it a bit to get it to
> compile because it depends on some internal libraries we use. In particular,
> DynamicAppSettings and Log are two internal classes that do what their names
> imply :)
> Make sure you initialize "servers" in the NewConnection() method to an array
> with your Thrift servers and you should be good to go. You use
> GetConnection() to get a connection and ReturnConnection() to return it back
> to the pool after you finish using it - make sure you don't close it in the
> application code.
>
> -eran
>
>
>
> On Wed, Apr 27, 2011 at 00:30, Josh <[EMAIL PROTECTED]> wrote:
>>
>> On Tue, Apr 26, 2011 at 3:34 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>> > Hi J-D,
>> > I don't think it's a Thrift issue. First, I use the TBufferedTransport
>> > transport, second, I implemented my own connection pool so the same
>> > connections are reused over and over again,
>>
>> Hey!  I'm using C#->Hbase and high on my list of things todo is
>> 'Implement Thrift Connection Pooling in C#'.  You have any desire to
>> release that code?
>>
>>
>> > so there is no overhead
>> > for opening and closing connections (I've verified that using
>> > Wireshark), third, if it was a client capacity issue I would expect to
>> > see an increase in throughput as I add more threads or run the test on
>> > two servers in parallel, this doesn't seem to happen, the total
>> > capacity remains unchanged.
>> >
>> > As for metrics, I already have it configured and monitored using
>> > Zabbix, but it only monitors specific counters, so let me know what
>> > information you would like to see. The numbers I quoted before are
>> > based on client counters and correlated with server counters ("multi"
>> > for writes and "get" for reads).
>> >
>> > -eran
>> >
>> >
>> >
>> > On Thu, Apr 21, 2011 at 20:43, Jean-Daniel Cryans <[EMAIL PROTECTED]>
>> > wrote:
>> >>
>> >> Hey Eran,
>> >>
>> >> Glad you could go back to debugging performance :)
>> >>
>> >> The scalability issues you are seeing are unknown to me, it sounds
>> >> like the client isn't pushing it enough. It reminded me of when we
>> >> switched to using the native Thrift PHP extension instead of the
>> >> "normal" one and we saw huge speedups. My limited knowledge of Thrift
>> >> may be blinding me, but I looked around for C# Thrift performance
>> >> issues and found threads like this one
>> >> http://www.mail-archive.com/[EMAIL PROTECTED]/msg00320.html
>> >>
>> >> As you didn't really debug the speed of Thrift itself in your setup,
>> >> this is one more variable in the problem.
>> >>
>> >> Also you don't really provide metrics about your system apart from
>> >> requests/second. Would it be possible for you set them up using this
>> >> guide? http://hbase.apache.org/metrics.html
>> >>
>> >> J-D
>> >>
>> >> On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <eran@> wrote:
>> >> > Hi J-D,
>> >> > After stabilizing the configuration, with your great help, I was able
>> >> > to go back to the the load tests. I tried using IRC, as you
>> >> > suggested,
>> >> > to continue this discussion but because of the time difference (I'm
>> >> > GMT+3) it is quite difficult to find a time when people are present
>> >> > and I am available to run long tests, so I'll give the mailing list
>> >> > one more try.
>> >> >
>> >> > I tested again on a clean table using 100 insert threads each, using
>> >> > a
>> >> > separate keyspace within the test table. Every row had just one
>> >> > column
>> >> > with 128 bytes of data.
>> >> > With one server and one region I got about 2300 inserts per second.
>> >> > After manually splitting the region I got about 3600 inserts per
>> >> > second (still on one machine). After a while the regions were