Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase : get(...) vs scan and in-memory table


Copy link to this message
-
Re: HBase : get(...) vs scan and in-memory table
Hi Omkar,

Your tables T1 and T2 are not so big. are your 100% they can fit in memory?
If yes, then why did you not distribute them to all the nodes in your MR
setup, like on a map format, using distributed cache? Then on your map
code, you will be 100% sure that both tables are local and in memory...

JM
2013/9/11 Omkar Joshi <[EMAIL PROTECTED]>

> I'm executing MR over HBase.
> The business logic in the reducer heavily accesses two tables, say T1(40k
> rows) and T2(90k rows). Currently, I'm executing the following steps :
> 1.In the constructor of the reducer class, doing something like this :
> HBaseCRUD hbaseCRUD = new HBaseCRUD();
>
> HTableInterface t1= hbaseCRUD.getTable("T1",
>                             "CF1", null, "C1", "C2");
> HTableInterface t2= hbaseCRUD.getTable("T2",
>                             "CF1", null, "C1", "C2");
> In the reduce(...)
>  String lowercase = ....;
>
> /* Start : HBase code */
> /*
> * TRY using get(...) on the table rather than a
> * Scan!
> */
> Scan scan = new Scan();
> scan.setStartRow(lowercase.getBytes());
> scan.setStopRow(lowercase.getBytes());
>
> /*scan will return a single row*/
> ResultScanner resultScanner = t1.getScanner(scan);
>
> for (Result result : resultScanner) {
> /*business logic*/
> }
> Though not sure if the above code is sensible in first place, I have a
> question - would a get(...) provide any performance benefit over the scan?
> Get get = new Get(lowercase.getBytes());
> Result getResult = t1.get(get);
> Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the
> performance will improve. As per HBase doc., I will have to re-create the
> tables T1 and T2. Please verify the correctness of my understanding :
> public void createTables(String tableName, boolean readOnly,
>             boolean blockCacheEnabled, boolean inMemory,
>             String... columnFamilyNames) throws IOException {
>         // TODO Auto-generated method stub
>
>         HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>         /* not sure !!! */
>         tableDesc.setReadOnly(readOnly);
>
>         HColumnDescriptor columnFamily = null;
>
>         if (!(columnFamilyNames == null || columnFamilyNames.length == 0))
> {
>
>             for (String columnFamilyName : columnFamilyNames) {
>
>                 columnFamily = new HColumnDescriptor(columnFamilyName);
>                 /*
>                  * Start : Do these steps ensure that the column
>                  * family(actually, the column data) is in-memory???
>                  */
>                 columnFamily.setBlockCacheEnabled(blockCacheEnabled);
>                 columnFamily.setInMemory(inMemory);
>                 /*
>                  * End : Do these steps ensure that the column
> family(actually,
>                  * the column data) is in-memory???
>                  */
>
>                 tableDesc.addFamily(columnFamily);
>             }
>         }
>
>         hbaseAdmin.createTable(tableDesc);
>         hbaseAdmin.close();
>     }
> Once done :
>
>  1.  How to verify that the columns are in-memory and accessed from there
> and not the disk?
>  2.  Is the from-memory or from-disk read transparent to the client? In
> simple words, do I need to change the HTable access code in my reducer
> class? If yes, what are the changes?
>
>
> Regards,
> Omkar Joshi
>
>
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB