Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Benchmarking and improvement of HBase's performance for a common bulk data workload


Copy link to this message
-
Re: Benchmarking and improvement of HBase's performance for a common bulk data workload
Thanks for thinking about ways to optimize such workload.

You can start with the following when setting up your cluster:
http://hbase.apache.org/book.html#configuration

For transactions, HBase is unique compared with PostgreSQL. See:
http://hbase.apache.org/book.html#acid

Cheers

On Sat, Apr 27, 2013 at 1:20 PM, Atri Sharma <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I have been discussing with Priyank sir on the following style of
> workload and whether we can improve HBase's performance in this area.
> The usecase is as follows:
>
> 1) Bulk load data.
> 2) Query the data multiple times(read access mostly, and no real time
> writes).
>
> This is a common workload, and I am pretty interested in benchmarking
> HBase's performance in this area, as well as improve this further.
>
> Please advice me on how I can proceed in benchmarking. Specifically,
> how will I need to set up a HBase cluster, will there be any specific
> requirements of the cluster for this type of testing?
>
>
> I worked on a patch to improve performance for a similar usecase in
> PostgreSQL. The case is pretty similar, bulk load of data, large
> number of mostly read only queries, and then deletion of the data.
>
> The optimization I targeted was the cost of writes to disk.
> Specifically, setting of flags(hint bits) for tracking the commt
> status of inserting/deleting transaction was causing a write overhead.
> I tried to mitigate this by making a cache which holds the transaction
> id in case of the above mentioned workload, hence mitigating the cost
> of writes.
>
> I will start benchmarking once I have the system set up and then start
> thinking of tests. Once I have an outline in my mind, I shall post it
> on the list.
>
> i will require the community's guidance in this a lot.
>
> Thoughts/Comments/Advice please?
>
> Regards,
>
> Atri
>
> --
> Regards,
>
> Atri
> l'apprenant
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB