-Re: YCSB tests for HBase on Whirr (was: Report to Apache board: first cut)
this comes at a very apt time for me. I will be evaluating hbase on ec2
using ycsb, and will run mapreduce jobs over there. Like for instance, I
will evaluate some simple agg ones (1512), with mapreduce jobs, coprocessor
and pure HBase APIs (like Scan + client side processing).
I have things running on local, and will move to ec2 pretty soon (by today).
Right now, zero experience with setting hbase on ec2. I may be bugging you
guys in case I get stuck. :)
On Fri, Jan 21, 2011 at 1:40 PM, Mingjie Lai <[EMAIL PROTECTED]>wrote:
> There is a discussion regarding testing HBASE with YCSB on Whirr or EC2.
> Send to @dev so more people can be involved.
> I have an automatic YCSB test for HBase running on EC2. It was derived from
> Andy and Eugene's HBase EC2 script. What I added include:
> - YCSB test support
> - build and upload new HBase jar triggered by SCM(git) changes
> - email YCSB test results to configured recipients
> - automatically running as a daily cron job
> You can take a look at: https://github.com/mlai/hbase-ec2/tree/ycsb for
> more detail.
> We do want to move the script to support Whirr, but right now we're lack of
> resources to do the job. Also It seems there is a Whirr HBase bug reported
> although I haven't exactly checked the detail. So there is no further
> progress toward Whirr support right now.
> >> Reporting back the results will be a bit more challenging as usually
> >> you spin down the cluster at end.
> I was also bothered a lot for what could be best way to present the result
> from an automatic test. I picked the simplest way -- sending result by
> emails, so that I can avoid the problem to save the data to somewhere.
> But it could be extended to support Hudson. Right now it downloads the
> result files locally after YCSB tests finished, and parse the result locally
> where I grab the detail of results as email contents. I think hudson can use
> the same files to present results.
> >> And we do
> >> not want to keep the cluster running unnecessarily for a build in web
> >> interface to browse the results etc.
> Totally agree, we want to terminate the cluster as soon as the test
> Here is an example of a test result:
> What do you think, Lars?
> -------- Original Message --------
> Subject: Re: Report to Apache board: first cut
> Date: Fri, 21 Jan 2011 09:46:46 -0800
> From: Stack <[EMAIL PROTECTED]>
> +1 to Todd suggestion (and change subject -- smile)
> On Fri, Jan 21, 2011 at 8:19 AM, Todd Lipcon<[EMAIL PROTECTED]> wrote:
>> Should we move this discussion to the dev list at large?
>> Our QA team is also starting to look at at least smoke testing HBase on a
>> cluster. We should coordinate efforts!
>> On Fri, Jan 21, 2011 at 12:56 AM, Lars George<[EMAIL PROTECTED]>
>> Hi Andy,
>>> I assumed as much from our previous conversations. I send Eugene the
>>> details on Whirr and using HBase with it. Unfortunately currently
>>> JClouds can not yet ship the scripts from the local directory, but
>>> that is coming soon. In the meantime we need to use a "public" S3
>>> based repo that has a copy. He had that set up last time we got HBase
>>> running together using Whirr. I think he is pretty much set, we simply
>>> need to add a specific "test" role that allows us to start the cluster
>>> and when "test" is part of the template we can not only start the
>>> cluster but invoke whatever test we need. In effect we could have
>>> "test-ycsb-basic", "test-ycsb-workload-5050", "test-mvn-test" (for the
>>> build in tests) and so on to start this. That has the advantage of
>>> being able to use various templates to test different cluster setups
>>> against equally different test scenarios.
>>> Reporting back the results will be a bit more challenging as usually
>>> you spin down the cluster at end. We could grab whatever the test