Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan vs Put vs Get


Copy link to this message
-
RE: Scan vs Put vs Get
In 0.94

The UI of the RS has a metrics table.  In that you can see blockCacheHitCount, blockCacheMissCount etc.  May be there is a variation when you do scan() and get() here.

Regards
Ram

> -----Original Message-----
> From: Jean-Marc Spaggiari [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, June 28, 2012 4:44 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Scan vs Put vs Get
>
> Wow. First, thanks a lot all for jumping into this.
>
> Let me try to reply to everyone in a single post.
>
> > How many Gets you batch together in one call
> I tried with multiple different values from 10 to 3000 with similar
> results.
> Time to read 10 lines : 181.0 mseconds (55 lines/seconds)
> Time to read 100 lines : 484.0 mseconds (207 lines/seconds)
> Time to read 1000 lines : 4739.0 mseconds (211 lines/seconds)
> Time to read 3000 lines : 13582.0 mseconds (221 lines/seconds)
>
> > Is this equal to the Scan#setCaching () that u are using?
> The scan call is done after the get test. So I can't set the cache for
> the scan before I do the gets. Also, I tried to run them separatly (On
> time only the put, one time only the get, etc.) so I did not find a
> way to setup the cache for the get.
>
> > If both are same u can be sure that the the number of NW calls is
> coming almost same.
> Here are the results for 10 000 gets and 10 000 scan.next(). Each time
> I access the result to be sure they are sent to the client.
> (gets) Time to read 10000 lines : 36620.0 mseconds (273 lines/seconds)
> (scan) Time to read 10000 lines : 119.0 mseconds (84034 lines/seconds)
>
> >[Block caching is enabled?]
> Good question. I don't know :( Is it enabled by default? How can I
> verify or activate it?
>
> > Also have you tried using Bloom filters?
> Not yet. They are on page 381 on Lars' book and I'm only on page 168 ;)
>
>
> > What's the hbase version you're using?
> I manually installed 0.94.0. I can try with an other version.
>
> > Is it repeatable?
> Yes. I tries many many times by adding some options, closing some
> process on the server side, remonving one datanode, adding one, etc. I
> can see some small variations, but still in the same range. I was able
> to move from 200 rows/second  to 300 rows/second. But that's not
> really a significant improvment. Also, here are the results for 7
> iterations of the same code.
>
> Time to read 1000 lines : 4171.0 mseconds (240 lines/seconds)
> Time to read 1000 lines : 3439.0 mseconds (291 lines/seconds)
> Time to read 1000 lines : 3953.0 mseconds (253 lines/seconds)
> Time to read 1000 lines : 3801.0 mseconds (263 lines/seconds)
> Time to read 1000 lines : 3680.0 mseconds (272 lines/seconds)
> Time to read 1000 lines : 3493.0 mseconds (286 lines/seconds)
> Time to read 1000 lines : 4549.0 mseconds (220 lines/seconds)
>
> >If the locations are wrong (region moved) you will have a retry loop
> I have one dead region. It's a server I brought down few days ago
> because it was to slow. But it's still on the hbase web interface.
> However, if I look at the table, there is no table region hosted on
> this server. Hadoop also was removed from it so it's saying one dead
> node.
>
> >Do you have anything in the logs?
> Nothing special. Only some "Block cache LRU eviction" entries.
>
> > Could you share as well the code
> Eveything is at the end of this post.
>
> >You can also check the cache hit and cache miss statistics that
> appears on
> the UI?
> Can you please tell me how I can find that? I was not able to find
> that on the web UI. Where should I look?
>
> > In your random scan how many Regions are scanned
> I only have 5 regions servers and 12 table regions. So I guess all the
> servers are called.
>
>
> So here is the code for the gets. I removed the KeyOnlyFilter because
> it's not improving the results.
>
> JM
>
>
>
>
> http://pastebin.com/K75nFiQk (for syntax highligthing)
>
> HTable table = new HTable(config, "test3");
>
> for (int iteration = 0; iteration < 10; iteration++)
> {
>