Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Read thruput


Copy link to this message
-
Re: Read thruput
Can you possible batch some Get calls to a Scan with a Filter that contains
the list of row keys you need?
For example, if you have 100 Gets, you can create a start key and end key
from getting the max and mix from those 100 row keys list. Next, you need
to write a filter which saves this 100 row keys to a private member and
uses the hint method in the Filter interface to jump to the closest rowkey
in the region it scans.

If you need help with that I can add a more detailed description of that
Filter.

This should reduce most of the heavy weight over head processing of each
Get.

On Tuesday, April 2, 2013, Vibhav Mundra wrote:

> How does your client call looks like? Get? Scan? Filters?
> --My client keeps doing the Get request. Each time a single row is fetched.
> Essentially we are using Hbase as key value retrieval.
>
> Is 3000/sec is client side calls or is it in numbers of rows per sec?
> --3000/sec is the client side calls.
>
> If you measure in MB/sec how much read throughput do you get?
> --Each client request's response is at maximum 1 KB so its the MB/sec is
> 3MB { 3000 * 1 KB }.
>
> Where is your client located? Same router as the cluster?
> --It is routed on the same cluster, on the same subnet.
>
> Have you activated dfs read short circuit? Of not try it.
> --I have not tried this. Let me try this also.
>
> Compression - try switching to Snappy - should be faster.
> What else is running on the cluster parallel to your reading client?
> --There is small upload code running. I have never seen the CPU usage more
> than 5%, so actually didnt bother to look at this angle.
>
> -Vibhav
>
>
> On Tue, Apr 2, 2013 at 1:42 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote:
>
> > How does your client call looks like? Get? Scan? Filters?
> > Is 3000/sec is client side calls or is it in numbers of rows per sec?
> > If you measure in MB/sec how much read throughput do you get?
> > Where is your client located? Same router as the cluster?
> > Have you activated dfs read short circuit? Of not try it.
> > Compression - try switching to Snappy - should be faster.
> > What else is running on the cluster parallel to your reading client?
> >
> > On Monday, April 1, 2013, Vibhav Mundra wrote:
> >
> > > What is the general read-thru put that one gets when using Hbase.
> > >
> > >  I am not to able to achieve more than 3000/secs with a timeout of 50
> > > millisecs.
> > > In this case also there is 10% of them are timing-out.
> > >
> > > -Vibhav
> > >
> > >
> > > On Mon, Apr 1, 2013 at 11:20 PM, Vibhav Mundra <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > yes, I have changes the BLOCK CACHE % to 0.35.
> > > >
> > > > -Vibhav
> > > >
> > > >
> > > > On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > > >
> > > >> I was aware of that discussion which was about MAX_FILESIZE and
> > > BLOCKSIZE
> > > >>
> > > >> My suggestion was about block cache percentage.
> > > >>
> > > >> Cheers
> > > >>
> > > >>
> > > >> On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> > wrote:
> > > >>
> > > >> > I have used the following site:
> > > >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
> > > >> >
> > > >> > to lessen the value of block cache.
> > > >> >
> > > >> > -Vibhav
> > > >> >
> > > >> >
> > > >> > On Mon, Apr 1, 2013 at 4:23 PM, Ted <[EMAIL PROTECTED]> wrote:
> > > >> >
> > > >> > > Can you increase block cache size ?
> > > >> > >
> > > >> > > What version of hbase are you using ?
> > > >> > >
> > > >> > > Thanks
> > > >> > >
> > > >> > > On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> > wrote:
> > > >> > >
> > > >> > > > The typical size of each of my row is less than 1KB.
> > > >> > > >
> > > >> > > > Regarding the memory, I have used 8GB for Hbase regionservers
> > and
> > > 4
> > > >> GB
> > > >> > > for
> > > >> > > > datanodes and I dont see them completely used. So I ruled out
> > the
> > > GC
> > > >> > > aspect.
> > > >> > > >
> > > >> > > > In case u still believe that GC is an issue, I will upload the
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB