Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Read thruput


Copy link to this message
-
Re: Read thruput
Asaf Mesika 2013-04-04, 04:21
Can you possible batch some Get calls to a Scan with a Filter that contains
the list of row keys you need?
For example, if you have 100 Gets, you can create a start key and end key
from getting the max and mix from those 100 row keys list. Next, you need
to write a filter which saves this 100 row keys to a private member and
uses the hint method in the Filter interface to jump to the closest rowkey
in the region it scans.

If you need help with that I can add a more detailed description of that
Filter.

This should reduce most of the heavy weight over head processing of each
Get.

On Tuesday, April 2, 2013, Vibhav Mundra wrote:

> How does your client call looks like? Get? Scan? Filters?
> --My client keeps doing the Get request. Each time a single row is fetched.
> Essentially we are using Hbase as key value retrieval.
>
> Is 3000/sec is client side calls or is it in numbers of rows per sec?
> --3000/sec is the client side calls.
>
> If you measure in MB/sec how much read throughput do you get?
> --Each client request's response is at maximum 1 KB so its the MB/sec is
> 3MB { 3000 * 1 KB }.
>
> Where is your client located? Same router as the cluster?
> --It is routed on the same cluster, on the same subnet.
>
> Have you activated dfs read short circuit? Of not try it.
> --I have not tried this. Let me try this also.
>
> Compression - try switching to Snappy - should be faster.
> What else is running on the cluster parallel to your reading client?
> --There is small upload code running. I have never seen the CPU usage more
> than 5%, so actually didnt bother to look at this angle.
>
> -Vibhav
>
>
> On Tue, Apr 2, 2013 at 1:42 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote:
>
> > How does your client call looks like? Get? Scan? Filters?
> > Is 3000/sec is client side calls or is it in numbers of rows per sec?
> > If you measure in MB/sec how much read throughput do you get?
> > Where is your client located? Same router as the cluster?
> > Have you activated dfs read short circuit? Of not try it.
> > Compression - try switching to Snappy - should be faster.
> > What else is running on the cluster parallel to your reading client?
> >
> > On Monday, April 1, 2013, Vibhav Mundra wrote:
> >
> > > What is the general read-thru put that one gets when using Hbase.
> > >
> > >  I am not to able to achieve more than 3000/secs with a timeout of 50
> > > millisecs.
> > > In this case also there is 10% of them are timing-out.
> > >
> > > -Vibhav
> > >
> > >
> > > On Mon, Apr 1, 2013 at 11:20 PM, Vibhav Mundra <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > yes, I have changes the BLOCK CACHE % to 0.35.
> > > >
> > > > -Vibhav
> > > >
> > > >
> > > > On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > > >
> > > >> I was aware of that discussion which was about MAX_FILESIZE and
> > > BLOCKSIZE
> > > >>
> > > >> My suggestion was about block cache percentage.
> > > >>
> > > >> Cheers
> > > >>
> > > >>
> > > >> On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> > wrote:
> > > >>
> > > >> > I have used the following site:
> > > >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
> > > >> >
> > > >> > to lessen the value of block cache.
> > > >> >
> > > >> > -Vibhav
> > > >> >
> > > >> >
> > > >> > On Mon, Apr 1, 2013 at 4:23 PM, Ted <[EMAIL PROTECTED]> wrote:
> > > >> >
> > > >> > > Can you increase block cache size ?
> > > >> > >
> > > >> > > What version of hbase are you using ?
> > > >> > >
> > > >> > > Thanks
> > > >> > >
> > > >> > > On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> > wrote:
> > > >> > >
> > > >> > > > The typical size of each of my row is less than 1KB.
> > > >> > > >
> > > >> > > > Regarding the memory, I have used 8GB for Hbase regionservers
> > and
> > > 4
> > > >> GB
> > > >> > > for
> > > >> > > > datanodes and I dont see them completely used. So I ruled out
> > the
> > > GC
> > > >> > > aspect.
> > > >> > > >
> > > >> > > > In case u still believe that GC is an issue, I will upload the