Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Locking in HIVE : How to use locking/unlocking features using hive java API ?


Copy link to this message
-
Locking in HIVE : How to use locking/unlocking features using hive java API ?
Hi,

I'm building / designing a back-up and restore tool for hive data for
Disaster Recovery scenarios.

I'm trying to understand the locking behavior of HIVE that is currently
supporting ZooKeeper for locking.

My thought process if like this ( early design.)

1. Backing up the meta-data of hive.
2. Backing up the data for hive tables on s3 or hdfs or NFS
3. Restoring table(s):
    a. Only Data
    b. Schema and data

So, to achieve 1st task, this is the flow I'm thinking.

a. Check whether there is any exclusive lock on the Table, whose meta-data
needs to be backed up.
         if YES then don't do any thing, wait and retry for configured
no/frequency
         if NO: Then get the meta-data of the table and create the DDL
statement for HIVE including table / partition etc.

For 2nd task:

a. Check whether the table has any exclusive lock,
        if NOT take shared lock and start copy, once done release the
shared lock.
        if YES then then wait and retry.

For 3rd: Restoring:

a. Only Data: Check if there is any lock on the table.
                     if NO, then take the exclusive lock, insert the data
into table, release the lock.
                     if YES then wait and retry.

b. Schema and Data:

                Check if there is any lock on table/partition.
                      if NO then Drop and create table/partitions.
                      if YES then wait and retry.
                 Once schema is created:
                      take the exclusive lock, insert data, release lock.
Now I'm going to run this kind of job from my scheduler / WF engine.
I need input on following questions:

a. Is this overall approach looks good?
b. How can I take and release different locks explicitly using HIVE API.
ref: https://cwiki.apache.org/confluence/display/Hive/Locking

If I understood correctly, As per this still HIVE doesn't support locking
explicitly at API level.
Is there any plan or patch to get this done.

I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further to
see, if can use these classes for locking features.

Thanks for your time and effort.

Regards,
Manish
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB