Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Online/Realtime query with filter and join?


Copy link to this message
-
Re: Online/Realtime query with filter and join?

You are going to want to figure out a rowkey (or a set of tables with
rowkeys) to restrict the number of I/O's. If you just slap Impala in front
of HBase (or even Phoenix, for that matter) you could write SQL against it
but if it's winds up doing a full-scan of an Hbase table underneath you
won't get your < 100ms response time.

Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
saying start with the rowkeys first so that you limit the I/O.  Then start
adding frameworks as needed (and/or build a schema with Phoenix in the
same rowkey exercise).

Such response-time requirements make me think that this is for application
support, so why the requirement for SQL? Might want to start writing it as
a Java program first.

On 11/29/13 4:32 PM, "Mourad K" <[EMAIL PROTECTED]> wrote:

>You might want to consider something like Impala or Phoenix, I presume
>you are trying to do some report query for dashboard or UI?
>MapReduce is certainly not adequate as there is too much latency on
>startup. If you want to give this a try, cdh4 and Impala are a good start.
>
>Mouradk
>
>On 29 Nov 2013, at 10:33, Ramon Wang <[EMAIL PROTECTED]> wrote:
>
>> The general performance requirement for each query is less than 100 ms,
>> that's the average level. Sounds crazy, but yes we need to find a way
>>for
>> it.
>>
>> Thanks
>> Ramon
>>
>>
>> On Fri, Nov 29, 2013 at 5:01 PM, yonghu <[EMAIL PROTECTED]> wrote:
>>
>>> The question is what you mean of "real-time". What is your performance
>>> request? In my opinion, I don't think the MapReduce is suitable for the
>>> real time data processing.
>>>
>>>
>>> On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>>>
>>>> you can try phoniex.
>>>> On 2013-11-29 3:44 PM, "Ramon Wang" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi Folks
>>>>>
>>>>> It seems to be impossible, but I still want to check if there is a
>>>>>way
>>> we
>>>>> can do "complex" query on HBase with "Order By", "JOIN".. etc like we
>>>> have
>>>>> with normal RDBMS, we are asked to provided such a solution for it,
>>>>>any
>>>>> ideas? Thanks for your help.
>>>>>
>>>>> BTW, i think maybe impala from CDH would be a way to go, but haven't
>>> got
>>>>> time to check it yet.
>>>>>
>>>>> Thanks
>>>>> Ramon
>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB