Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Optimizing Multi Gets in hbase


+
Varun Sharma 2013-02-18, 09:57
+
Anoop Sam John 2013-02-18, 10:49
+
Viral Bajaria 2013-02-18, 10:49
+
Nicolas Liochon 2013-02-18, 10:56
+
ramkrishna vasudevan 2013-02-18, 11:07
Copy link to this message
-
Re: Optimizing Multi Gets in hbase
So you'd have to do a little bit of home work up front.

Supposed you have to pull some data from 30K rows out of 10 Mil?
If they are in sort order, you could determine the regions and then think about doing a couple of scans in parallel.

But that may be more work than just doing the set of gets.

It would be interesting to benchmark the performance....

I wonder if a coprocessor could help speed this up?  
I mean use the cp to do all the gets per region rather than a full region scan and then filter against the list for that region.

Again this would be for a very specific type of query....
On Feb 18, 2013, at 5:07 AM, ramkrishna vasudevan <[EMAIL PROTECTED]> wrote:

> If the scan is happening on the same region then going for Scan would be a
> better option.
>
> Regards
> RAm
>
> On Mon, Feb 18, 2013 at 4:26 PM, Nicolas Liochon <[EMAIL PROTECTED]> wrote:
>
>> i) Yes, or, at least, of often yes.
>> II) You're right. It's difficult to guess how much it would improve the
>> performances (there is a lot of caching effect), but using a single scan
>> could be an interesting optimisation imho.
>>
>> Nicolas
>>
>>
>> On Mon, Feb 18, 2013 at 10:57 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to batched get(s) on a cluster. Here is the code:
>>>
>>> List<Get> gets = ...
>>> // Prepare my gets with the rows i need
>>> myHTable.get(gets);
>>>
>>> I have two questions about the above scenario:
>>> i) Is this the most optimal way to do this ?
>>> ii) I have a feeling that if there are multiple gets in this case, on the
>>> same region, then each one of those shall instantiate separate scan(s)
>> over
>>> the region even though a single scan is sufficient. Am I mistaken here ?
>>>
>>> Thanks
>>> Varun
>>>
>>
+
lars hofhansl 2013-02-19, 01:48
+
Varun Sharma 2013-02-19, 06:45
+
lars hofhansl 2013-02-19, 08:02
+
Nicolas Liochon 2013-02-19, 08:37
+
Varun Sharma 2013-02-19, 15:52
+
Nicolas Liochon 2013-02-19, 17:28
+
Varun Sharma 2013-02-19, 18:19
+
lars hofhansl 2013-02-19, 18:27
+
Nicolas Liochon 2013-02-19, 18:42
+
Nicolas Liochon 2013-02-19, 18:46
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB