Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Scan vs Put vs Get


Copy link to this message
-
RE: Scan vs Put vs Get
Ramkrishna.S.Vasudevan 2012-06-28, 08:44
Hi

You can also check the cache hit and cache miss statistics that appears on
the UI?

In your random scan how many Regions are scanned whereas in gets may be many
due to randomness.

Regards
Ram

> -----Original Message-----
> From: N Keywal [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, June 28, 2012 2:00 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Scan vs Put vs Get
>
> Hi Jean-Marc,
>
> Interesting.... :-)
>
> Added to Anoop questions:
>
> What's the hbase version you're using?
>
> Is it repeatable, I mean if you try twice the same "gets" with the
> same client do you have the same results? I'm asking because the
> client caches the locations.
>
> If the locations are wrong (region moved) you will have a retry loop,
> and it includes a sleep. Do you have anything in the logs?
>
> Could you share as well the code you're using to get the ~100 ms time?
>
> Cheers,
>
> N.
>
> On Thu, Jun 28, 2012 at 6:56 AM, Anoop Sam John <[EMAIL PROTECTED]>
> wrote:
> > Hi
> >     How many Gets you batch together in one call? Is this equal to
> the Scan#setCaching () that u are using?
> > If both are same u can be sure that the the number of NW calls is
> coming almost same.
> >
> > Also you are giving random keys in the Gets. The scan will be always
> sequential. Seems in your get scenario it is very very random reads
> resulting in too many reads of HFile block from HDFS. [Block caching is
> enabled?]
> >
> > Also have you tried using Bloom filters?  ROW blooms might improve
> your get performance.
> >
> > -Anoop-
> > ________________________________________
> > From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
> > Sent: Thursday, June 28, 2012 5:04 AM
> > To: user
> > Subject: Scan vs Put vs Get
> >
> > Hi,
> >
> > I have a small piece of code, for testing, which is putting 1B lines
> > in an existing table, getting 3000 lines and scanning 10000.
> >
> > The table is one family, one column.
> >
> > Everything is done randomly. Put with Random key (24 bytes), fixed
> > family and fixed column names with random content (24 bytes).
> >
> > Get (batch) is done with random keys and scan with RandomRowFilter.
> >
> > And here are the results.
> > Time to insert 1000000 lines: 43 seconds (23255 lines/seconds)
> > That's correct for my needs based on the poor performances of the
> > servers in the cluster. I'm fine with the results.
> >
> > Time to read 3000 lines: 11444.0 mseconds (262 lines/seconds)
> > This is way to low. I don't understand why. So I tried the random
> scan
> > because I'm not able to figure the issue.
> >
> > Time to read 10000 lines: 108.0 mseconds (92593 lines/seconds)
> > This it impressive! I have added that after I failed with the get. I
> > moved from 262 lines per seconds to almost 100K lines/seconds!!! It's
> > awesome!
> >
> > However, I'm still wondering what's wrong with my gets.
> >
> > The code is very simple. I'm using Get objects that I'm executing in
> a
> > Batch. I tried to add a filter but it's not helping. Here is an
> > extract of the code.
> >
> >                        for (long l = 0; l < linesToRead; l++)
> >                        {
> >                                byte[] array1 = new byte[24];
> >                                for (int i = 0; i < array1.length;
> i++)
> >                                                array1[i] > (byte)Math.floor(Math.random() * 256);
> >                                Get g = new Get (array1);
> >                                gets.addElement(g);
> >                        }
> >                                Object[] results = new
> Object[gets.size()];
> >                                System.out.println(new java.util.Date
> () + " \"gets\" created.");
> >                                long timeBefore > System.currentTimeMillis();
> >                        table.batch(gets, results);
> >                        long timeAfter = System.currentTimeMillis();
> >
> >                        float duration = timeAfter - timeBefore;
> >                        System.out.println ("Time to read " +