Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Reg: <Real Time Hive query>


Copy link to this message
-
Re: Reg: <Real Time Hive query>
Hi,

As mentioned by Nitin and other fellows.
There are few points you need to consider.

1. Hive is currently and build for OLAP apps and not for OLTP ( Realtime
like RDBMS like MySQL, Oracle)

2. Though you can connect to Hive Thrift using JDBC implementation, but its
still not a production grade API as its is not scale able  for concurrent
clients.
reference:  https://cwiki.apache.org/Hive/hiveserver.html

As current JDBC driver execute HQL Queries through hive server.

3. Realtime query system also requires sophisticated locking protocols,
where HIVE implements very basic locking protocols as what is required.

4. Hive Metastore is also not that scale able right now, as it can get into
OOM exception once the partitions are more.
ref: https://issues.apache.org/jira/browse/HIVE-2907

5. Hive Metastore Client doesn't has retry logic , so when CLI or Hive
Server is connected to Hive Metastore and connection drops or having some
network issue, it cannot reconnect automatically.

ref:
http://mail-archives.apache.org/mod_mbox/hive-user/201211.mbox/%3CCA+FBdFT20nnQ5pOMcJ0ctE8RRseVFxxJO4qjAgxD1doBc+[EMAIL PROTECTED]%3E
https://issues.apache.org/jira/browse/HIVE-3400 ( looks like this is fixed,
but don't which version will be having this fix as well)

5. Need to architect the realtime query system around Hive by using other
technologies like MySQL (RDBMS), Cache etc. ( by pushing aggregated data to
the RDBMS or Cache layer) and then allow users to write query on top of
atleast 1 level of aggregated / grouped data, to reduce the data to be
queried.

There are other points as well that is obvious and well known different in
Hive and RDBMS.

So, please see the points above and take your decision and design the
system.
Hope fully this will help.

Regards,
Manish

On Tue, Dec 25, 2012 at 2:06 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> Hive is not like mysql where u just query and get the results. It will
> take time based on data size and query. You may look at oozie if you want
> to build an application or look at penatho with hive integration
>
> Hive cli is not only for testing. You can build application using hive cli
> and scripting languages
>
> You can use hive thrift server and use it like jdbc but keep in mind this
> is never realtime
> On Dec 25, 2012 3:24 PM, "Kshiva Kps" <[EMAIL PROTECTED]> wrote:
>
>> Many Thanks, for your replay.
>>
>> But in real time if you want to develop application (jobs) in this case
>> CLI won't help us, CLI is for testing pls current if i'm worng, thanks.
>>
>>  Many Thanks
>> Kshiva@ +91 9940163885
>>
>> On Tue, Dec 25, 2012 at 2:09 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>
>>> hive comes with a thrift server so you can connect via jdbc.
>>>
>>> you just want to execute queries, why dont u use hive cli  ?
>>>
>>>
>>> On Tue, Dec 25, 2012 at 1:01 PM, Kshiva Kps <[EMAIL PROTECTED]> wrote:
>>>
>>>> Thnaks... sorry to ask you if possible could you  pls advice on below
>>>> points
>>>>
>>>>>
>>>>> In general in Real time how we will write scripts
>>>>> 1. Java + hive query  --could you pls ,if possibl share one program
>>>>> which can be executed thro Eclipse IDE many thanks.
>>>>>
>>>>> Thnks
>>>>> Siva @09940163885
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB