Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> YCSB tests for HBase on Whirr (was: Report to Apache board: first cut)

Mingjie Lai 2011-01-21, 20:40
Ted Dunning 2011-01-21, 22:43
Andrew Purtell 2011-01-24, 00:48
Lars George 2011-01-24, 14:55
Lars George 2011-01-22, 00:00
Copy link to this message
Re: YCSB tests for HBase on Whirr (was: Report to Apache board: first cut)
hello Mingjie,
this comes at a very apt time for me. I will be evaluating hbase on ec2
using ycsb, and will run mapreduce jobs over there. Like for instance, I
will evaluate some simple agg ones (1512), with mapreduce jobs, coprocessor
and pure HBase APIs (like Scan + client side processing).

I have things running on local, and will move to ec2 pretty soon (by today).
Right now, zero experience with setting hbase on  ec2. I may be bugging you
guys in case I get stuck. :)


On Fri, Jan 21, 2011 at 1:40 PM, Mingjie Lai <[EMAIL PROTECTED]>wrote:

> Guys.
> There is a discussion regarding testing HBASE with YCSB on Whirr or EC2.
> Send to @dev so more people can be involved.
> Lars.
> I have an automatic YCSB test for HBase running on EC2. It was derived from
> Andy and Eugene's HBase EC2 script. What I added include:
> - YCSB test support
> - build and upload new HBase jar triggered by SCM(git) changes
> - email YCSB test results to configured recipients
> - automatically running as a daily cron job
> You can take a look at: https://github.com/mlai/hbase-ec2/tree/ycsb for
> more detail.
> We do want to move the script to support Whirr, but right now we're lack of
> resources to do the job. Also It seems there is a Whirr HBase bug reported
> although I haven't exactly checked the detail. So there is no further
> progress toward Whirr support right now.
> >> Reporting back the results will be a bit more challenging as usually
> >> you spin down the cluster at end.
> I was also bothered a lot for what could be best way to present the result
> from an automatic test. I picked the simplest way -- sending result by
> emails, so that I can avoid the problem to save the data to somewhere.
> But it could be extended to support Hudson. Right now it downloads the
> result files locally after YCSB tests finished, and parse the result locally
> where I grab the detail of results as email contents. I think hudson can use
> the same files to present results.
> >> And we do
> >> not want to keep the cluster running unnecessarily for a build in web
> >> interface to browse the results etc.
> Totally agree, we want to terminate the cluster as soon as the test
> finished.
> Here is an example of a test result:
> http://pastebin.com/f08bRCkY
> What do you think, Lars?
> Thanks,
> Mingjie
> -------- Original Message --------
> Subject:        Re: Report to Apache board: first cut
> Date:   Fri, 21 Jan 2011 09:46:46 -0800
> From:   Stack <[EMAIL PROTECTED]>
> +1 to Todd suggestion (and change subject -- smile)
> St.Ack
> On Fri, Jan 21, 2011 at 8:19 AM, Todd Lipcon<[EMAIL PROTECTED]>  wrote:
>>  Should we move this discussion to the dev list at large?
>>  Our QA team is also starting to look at at least smoke testing HBase on a
>>  cluster. We should coordinate efforts!
>>  On Fri, Jan 21, 2011 at 12:56 AM, Lars George<[EMAIL PROTECTED]>
>>  wrote:
>>   Hi Andy,
>>>  I assumed as much from our previous conversations. I send Eugene the
>>>  details on Whirr and using HBase with it. Unfortunately currently
>>>  JClouds can not yet ship the scripts from the local directory, but
>>>  that is coming soon. In the meantime we need to use a "public" S3
>>>  based repo that has a copy. He had that set up last time we got HBase
>>>  running together using Whirr. I think he is pretty much set, we simply
>>>  need to add a specific "test" role that allows us to start the cluster
>>>  and when "test" is part of the template we can not only start the
>>>  cluster but invoke whatever test we need. In effect we could have
>>>  "test-ycsb-basic", "test-ycsb-workload-5050", "test-mvn-test" (for the
>>>  build in tests) and so on to start this. That has the advantage of
>>>  being able to use various templates to test different cluster setups
>>>  against equally different test scenarios.
>>>  Reporting back the results will be a bit more challenging as usually
>>>  you spin down the cluster at end. We could grab whatever the test