Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> new feature in hive: links


Copy link to this message
-
Re: new feature in hive: links
I added the comments/questions to the wiki (
https://cwiki.apache.org/confluence/display/Hive/Links). I'm also copying
them here:

The first draft of this proposal is very hard to decipher because it relies
on terms that aren't well defined. For example, here's the second sentence
from the motivations section:

bq. Growth beyond a single warehouse (or) separation of capacity usage and
allocation requires the creation of multiple physical warehouses, i.e.,
separate Hive instances.

What's the difference between a warehouse and a physical warehouse? How do
you define a Hive instance? In the requirements section the term virtual
warehouse is introduced and equated to a namespace, but clearly it's more
than that because otherwise DBs/Schemas would suffice. Can you please
update the proposal to include definitions for these terms?
bq. Prevent access using two part name syntax (Y.T) if namespaces feature
is "on" in a Hive instance. This ensures the database is self-contained.

The cross-namespace HiveConf ACL proposed in HIVE-3016 doesn't prevent
anyone from doing anything because there is no way to keep users from
disabling it. I'm surprised to see this ticket mentioned here since three
committers have already gone on record saying that this is the wrong
approach, and one committer even -1'd it. If preventing cross-db references
in queries is a requirement for this project, then I think Hive's
authorization mechanism will need to be extended to support this
privilege/restriction.

>From the design section:

bq. We are building a namespace service external to Hive that has metadata
on namespace location across the Hive instances, and allows importing data
across Hive instances using replication.

Does the work proposed in HIVE-2989 also include adding this Db/Table
replication infrastructure to Hive? If so, what is the timeline for adding
it?

Thanks.

Carl

On Tue, May 22, 2012 at 9:18 AM, Ashutosh Chauhan <[EMAIL PROTECTED]>wrote:

> To kickstart the review, I did a quick review of the doc. Few questions
> popped out to me, which I asked. Sambavi was kind enough to come back with
> replies for them. I am continuing to look into it. Will encourage other
> folks to look into it as well.
>
>
> Thanks,
>
> Ashutosh
>
>
> <Begin Forward Message>
>
>
> Hi Ashutosh****
>
> ** **
>
> Thanks for looking through the design and providing your feedback!****
>
> ** **
>
> Responses below:****
>
> * What exactly is contained in tracking capacity usage. One is disk space.
> That I presume you are going to track via summing size under database
> directory. Are you also thinking of tracking resource usage in terms of
> CPU/memory/network utilization for different teams? ****
>
> Right now the capacity usage in Hive we will track is the disk space
> (managed tables that belong to the namespace + imported tables). We will
> track the mappers and reducers that the namepace utilizes directly from
> Hadoop.****
>
> ** **
>
> * Each namespace (ns) will have exactly one database. If so, then users are
> not allowed to create/use databases in such deployment? Not necessarily a
> problem, just trying to understand design.****
>
> Yes, you are correct – this is a limitation of the design. Introducing a
> new concept seemed heavyweight, so you can instead think of this as
> “self-contained” databases. But it means that a given namespace cannot have
> sub-databases in it.****
>
> ** **
>
> * How are you going to keep metadata consistent across two ns? If metadata
> gets updated in remote ns, will it get automatically updated in user's
> local ns? If yes, how will this be implemented? If no, then every time user
> need to use data from remote ns, she has to bring metadata uptodate in her
> ns. How will she do it?****
>
> Metadata will be kept in sync for linked tables. We will make alter table
> on the remote table (source of the link) cause an update to the target of
> the link. Note that from a Hive perspective, the metadata for the source
> and target of a link is in the same metastore.****
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB