Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan vs Put vs Get


Copy link to this message
-
RE: Scan vs Put vs Get
In 0.94

The UI of the RS has a metrics table.  In that you can see blockCacheHitCount, blockCacheMissCount etc.  May be there is a variation when you do scan() and get() here.

Regards
Ram

> -----Original Message-----
> From: Jean-Marc Spaggiari [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, June 28, 2012 4:44 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Scan vs Put vs Get
>
> Wow. First, thanks a lot all for jumping into this.
>
> Let me try to reply to everyone in a single post.
>
> > How many Gets you batch together in one call
> I tried with multiple different values from 10 to 3000 with similar
> results.
> Time to read 10 lines : 181.0 mseconds (55 lines/seconds)
> Time to read 100 lines : 484.0 mseconds (207 lines/seconds)
> Time to read 1000 lines : 4739.0 mseconds (211 lines/seconds)
> Time to read 3000 lines : 13582.0 mseconds (221 lines/seconds)
>
> > Is this equal to the Scan#setCaching () that u are using?
> The scan call is done after the get test. So I can't set the cache for
> the scan before I do the gets. Also, I tried to run them separatly (On
> time only the put, one time only the get, etc.) so I did not find a
> way to setup the cache for the get.
>
> > If both are same u can be sure that the the number of NW calls is
> coming almost same.
> Here are the results for 10 000 gets and 10 000 scan.next(). Each time
> I access the result to be sure they are sent to the client.
> (gets) Time to read 10000 lines : 36620.0 mseconds (273 lines/seconds)
> (scan) Time to read 10000 lines : 119.0 mseconds (84034 lines/seconds)
>
> >[Block caching is enabled?]
> Good question. I don't know :( Is it enabled by default? How can I
> verify or activate it?
>
> > Also have you tried using Bloom filters?
> Not yet. They are on page 381 on Lars' book and I'm only on page 168 ;)
>
>
> > What's the hbase version you're using?
> I manually installed 0.94.0. I can try with an other version.
>
> > Is it repeatable?
> Yes. I tries many many times by adding some options, closing some
> process on the server side, remonving one datanode, adding one, etc. I
> can see some small variations, but still in the same range. I was able
> to move from 200 rows/second  to 300 rows/second. But that's not
> really a significant improvment. Also, here are the results for 7
> iterations of the same code.
>
> Time to read 1000 lines : 4171.0 mseconds (240 lines/seconds)
> Time to read 1000 lines : 3439.0 mseconds (291 lines/seconds)
> Time to read 1000 lines : 3953.0 mseconds (253 lines/seconds)
> Time to read 1000 lines : 3801.0 mseconds (263 lines/seconds)
> Time to read 1000 lines : 3680.0 mseconds (272 lines/seconds)
> Time to read 1000 lines : 3493.0 mseconds (286 lines/seconds)
> Time to read 1000 lines : 4549.0 mseconds (220 lines/seconds)
>
> >If the locations are wrong (region moved) you will have a retry loop
> I have one dead region. It's a server I brought down few days ago
> because it was to slow. But it's still on the hbase web interface.
> However, if I look at the table, there is no table region hosted on
> this server. Hadoop also was removed from it so it's saying one dead
> node.
>
> >Do you have anything in the logs?
> Nothing special. Only some "Block cache LRU eviction" entries.
>
> > Could you share as well the code
> Eveything is at the end of this post.
>
> >You can also check the cache hit and cache miss statistics that
> appears on
> the UI?
> Can you please tell me how I can find that? I was not able to find
> that on the web UI. Where should I look?
>
> > In your random scan how many Regions are scanned
> I only have 5 regions servers and 12 table regions. So I guess all the
> servers are called.
>
>
> So here is the code for the gets. I removed the KeyOnlyFilter because
> it's not improving the results.
>
> JM
>
>
>
>
> http://pastebin.com/K75nFiQk (for syntax highligthing)
>
> HTable table = new HTable(config, "test3");
>
> for (int iteration = 0; iteration < 10; iteration++)
> {
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB