Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Online/Realtime query with filter and join?


Copy link to this message
-
Re: Online/Realtime query with filter and join?
Pradeep, correct me if I am wrong but prestodb has not released the HBase
plugin as yet or they did and maybe I missed the announcement ?

I agree with what Doug is saying here, you can't achieve < 100ms on every
kind of query on HBase unless and until you design the rowkey in a way to
help you reduce your I/O. A full scan of a table with billions of rows and
columns can take forever, but good indexing (via rowkey or secondary
indexes) could help speed up.

Thanks,
Viral
On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota <[EMAIL PROTECTED]>wrote:

> In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
> mix. :)
>
> http://prestodb.io/
>
>
> On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil <[EMAIL PROTECTED]
> >wrote:
>
> >
> > You are going to want to figure out a rowkey (or a set of tables with
> > rowkeys) to restrict the number of I/O's. If you just slap Impala in
> front
> > of HBase (or even Phoenix, for that matter) you could write SQL against
> it
> > but if it's winds up doing a full-scan of an Hbase table underneath you
> > won't get your < 100ms response time.
> >
> > Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
> > saying start with the rowkeys first so that you limit the I/O.  Then
> start
> > adding frameworks as needed (and/or build a schema with Phoenix in the
> > same rowkey exercise).
> >
> > Such response-time requirements make me think that this is for
> application
> > support, so why the requirement for SQL? Might want to start writing it
> as
> > a Java program first.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 11/29/13 4:32 PM, "Mourad K" <[EMAIL PROTECTED]> wrote:
> >
> > >You might want to consider something like Impala or Phoenix, I presume
> > >you are trying to do some report query for dashboard or UI?
> > >MapReduce is certainly not adequate as there is too much latency on
> > >startup. If you want to give this a try, cdh4 and Impala are a good
> start.
> > >
> > >Mouradk
> > >
> > >On 29 Nov 2013, at 10:33, Ramon Wang <[EMAIL PROTECTED]> wrote:
> > >
> > >> The general performance requirement for each query is less than 100
> ms,
> > >> that's the average level. Sounds crazy, but yes we need to find a way
> > >>for
> > >> it.
> > >>
> > >> Thanks
> > >> Ramon
> > >>
> > >>
> > >> On Fri, Nov 29, 2013 at 5:01 PM, yonghu <[EMAIL PROTECTED]>
> wrote:
> > >>
> > >>> The question is what you mean of "real-time". What is your
> performance
> > >>> request? In my opinion, I don't think the MapReduce is suitable for
> the
> > >>> real time data processing.
> > >>>
> > >>>
> > >>> On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu <[EMAIL PROTECTED]>
> wrote:
> > >>>
> > >>>> you can try phoniex.
> > >>>> On 2013-11-29 3:44 PM, "Ramon Wang" <[EMAIL PROTECTED]> wrote:
> > >>>>
> > >>>>> Hi Folks
> > >>>>>
> > >>>>> It seems to be impossible, but I still want to check if there is a
> > >>>>>way
> > >>> we
> > >>>>> can do "complex" query on HBase with "Order By", "JOIN".. etc like
> we
> > >>>> have
> > >>>>> with normal RDBMS, we are asked to provided such a solution for it,
> > >>>>>any
> > >>>>> ideas? Thanks for your help.
> > >>>>>
> > >>>>> BTW, i think maybe impala from CDH would be a way to go, but
> haven't
> > >>> got
> > >>>>> time to check it yet.
> > >>>>>
> > >>>>> Thanks
> > >>>>> Ramon
> > >>>
> >
> >
>