Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase : get(...) vs scan and in-memory table


Copy link to this message
-
Re: HBase : get(...) vs scan and in-memory table
I see.

>From the documentation: " In-memory blocks have the highest priority
in the “Block
Cache” <http://hbase.apache.org/book/regionserver.arch.html#block.cache>,
but it is not a guarantee that the entire table will be in memory. ". You
might want to take a look here:
http://hbase.apache.org/book/regionserver.arch.html#block.cache

So for your 2 questions:

 1.  How to verify that the columns are in-memory and accessed from there
and not the disk? => I don't think you can get Block Cache metrics per
table. You can get metrics, but they are overall (see here:
http://hbase.apache.org/book/hbase_metrics.html#rs_metrics_other)
 2.  Is the from-memory or from-disk read transparent to the client? In
simple words, do I need to change the HTable access code in my reducer
class? If yes, what are the changes? => The in-memory setting is on the
Table Column Family side. You should see on the Master UI interface that
it's there by looking at the table definition. There is nothing required on
the client side to take advantage of it.

JM
2013/9/11 Omkar Joshi <[EMAIL PROTECTED]>

> Hi JM,
>
> Yes, I have DistributedCache on my mind too but not sure if those tables
> will be read-only in future. Besides, I want to check whether with their
> current size, those can be kept in-memory in HBase.
>
> Regards,
> Omkar Joshi
>
>
>
> -----Original Message-----
> From: Jean-Marc Spaggiari [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 11, 2013 5:06 PM
> To: user
> Subject: Re: HBase : get(...) vs scan and in-memory table
>
> Hi Omkar,
>
> Your tables T1 and T2 are not so big. are your 100% they can fit in memory?
> If yes, then why did you not distribute them to all the nodes in your MR
> setup, like on a map format, using distributed cache? Then on your map
> code, you will be 100% sure that both tables are local and in memory...
>
> JM
>
>
> 2013/9/11 Omkar Joshi <[EMAIL PROTECTED]>
>
> > I'm executing MR over HBase.
> > The business logic in the reducer heavily accesses two tables, say T1(40k
> > rows) and T2(90k rows). Currently, I'm executing the following steps :
> > 1.In the constructor of the reducer class, doing something like this :
> > HBaseCRUD hbaseCRUD = new HBaseCRUD();
> >
> > HTableInterface t1= hbaseCRUD.getTable("T1",
> >                             "CF1", null, "C1", "C2");
> > HTableInterface t2= hbaseCRUD.getTable("T2",
> >                             "CF1", null, "C1", "C2");
> > In the reduce(...)
> >  String lowercase = ....;
> >
> > /* Start : HBase code */
> > /*
> > * TRY using get(...) on the table rather than a
> > * Scan!
> > */
> > Scan scan = new Scan();
> > scan.setStartRow(lowercase.getBytes());
> > scan.setStopRow(lowercase.getBytes());
> >
> > /*scan will return a single row*/
> > ResultScanner resultScanner = t1.getScanner(scan);
> >
> > for (Result result : resultScanner) {
> > /*business logic*/
> > }
> > Though not sure if the above code is sensible in first place, I have a
> > question - would a get(...) provide any performance benefit over the
> scan?
> > Get get = new Get(lowercase.getBytes());
> > Result getResult = t1.get(get);
> > Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the
> > performance will improve. As per HBase doc., I will have to re-create the
> > tables T1 and T2. Please verify the correctness of my understanding :
> > public void createTables(String tableName, boolean readOnly,
> >             boolean blockCacheEnabled, boolean inMemory,
> >             String... columnFamilyNames) throws IOException {
> >         // TODO Auto-generated method stub
> >
> >         HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> >         /* not sure !!! */
> >         tableDesc.setReadOnly(readOnly);
> >
> >         HColumnDescriptor columnFamily = null;
> >
> >         if (!(columnFamilyNames == null || columnFamilyNames.length => 0))
> > {
> >
> >             for (String columnFamilyName : columnFamilyNames) {
> >
> >                 columnFamily = new HColumnDescriptor(columnFamilyName);