Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase : get(...) vs scan and in-memory table


Copy link to this message
-
RE: HBase : get(...) vs scan and in-memory table
Vladimir Rodionov 2013-09-11, 18:07
There is no guarantee that your tables are in memory and you can not verify this directly. HBase will do its best to keep them in memory but its not 100%.
Cache is divided in 3 zones (default cache) and for IN_MEMORY tables HBase allocates 25% of a cache. If your data does not fit into this 25%
- try increasing block cache size.

>>Is the from-memory or from-disk read transparent to the client?

Yes, absolutely transparent.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Omkar Joshi [[EMAIL PROTECTED]]
Sent: Wednesday, September 11, 2013 4:41 AM
To: [EMAIL PROTECTED]
Subject: RE: HBase : get(...) vs scan and in-memory table

Hi JM,

Yes, I have DistributedCache on my mind too but not sure if those tables will be read-only in future. Besides, I want to check whether with their current size, those can be kept in-memory in HBase.

Regards,
Omkar Joshi

-----Original Message-----
From: Jean-Marc Spaggiari [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 11, 2013 5:06 PM
To: user
Subject: Re: HBase : get(...) vs scan and in-memory table

Hi Omkar,

Your tables T1 and T2 are not so big. are your 100% they can fit in memory?
If yes, then why did you not distribute them to all the nodes in your MR
setup, like on a map format, using distributed cache? Then on your map
code, you will be 100% sure that both tables are local and in memory...

JM
2013/9/11 Omkar Joshi <[EMAIL PROTECTED]>

> I'm executing MR over HBase.
> The business logic in the reducer heavily accesses two tables, say T1(40k
> rows) and T2(90k rows). Currently, I'm executing the following steps :
> 1.In the constructor of the reducer class, doing something like this :
> HBaseCRUD hbaseCRUD = new HBaseCRUD();
>
> HTableInterface t1= hbaseCRUD.getTable("T1",
>                             "CF1", null, "C1", "C2");
> HTableInterface t2= hbaseCRUD.getTable("T2",
>                             "CF1", null, "C1", "C2");
> In the reduce(...)
>  String lowercase = ....;
>
> /* Start : HBase code */
> /*
> * TRY using get(...) on the table rather than a
> * Scan!
> */
> Scan scan = new Scan();
> scan.setStartRow(lowercase.getBytes());
> scan.setStopRow(lowercase.getBytes());
>
> /*scan will return a single row*/
> ResultScanner resultScanner = t1.getScanner(scan);
>
> for (Result result : resultScanner) {
> /*business logic*/
> }
> Though not sure if the above code is sensible in first place, I have a
> question - would a get(...) provide any performance benefit over the scan?
> Get get = new Get(lowercase.getBytes());
> Result getResult = t1.get(get);
> Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the
> performance will improve. As per HBase doc., I will have to re-create the
> tables T1 and T2. Please verify the correctness of my understanding :
> public void createTables(String tableName, boolean readOnly,
>             boolean blockCacheEnabled, boolean inMemory,
>             String... columnFamilyNames) throws IOException {
>         // TODO Auto-generated method stub
>
>         HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>         /* not sure !!! */
>         tableDesc.setReadOnly(readOnly);
>
>         HColumnDescriptor columnFamily = null;
>
>         if (!(columnFamilyNames == null || columnFamilyNames.length == 0))
> {
>
>             for (String columnFamilyName : columnFamilyNames) {
>
>                 columnFamily = new HColumnDescriptor(columnFamilyName);
>                 /*
>                  * Start : Do these steps ensure that the column
>                  * family(actually, the column data) is in-memory???
>                  */
>                 columnFamily.setBlockCacheEnabled(blockCacheEnabled);
>                 columnFamily.setInMemory(inMemory);
>                 /*
>                  * End : Do these steps ensure that the column
> family(actually,
>                  * the column data) is in-memory???

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.