Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Managed vs external tables in hive


Copy link to this message
-
Re: Managed vs external tables in hive
Hi Ranjith,
I use buckets with external tables, no problem.

I concur with other people on the thread. Having an external table vs. managed table on HDFS should have minimal impact what operations you can perform on those tables.

Mark

----- Original Message -----
From: "Ranjith" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Sunday, May 13, 2012 4:07:48 PM
Subject: Re: Managed vs external tables in hive
Edward,
Did you confirm this through the explain plan or through the execution of the ddl alone. And have you tried buckets with external tables?

Thanks,
Ranjith

On May 13, 2012, at 2:33 PM, Edward Capriolo < [EMAIL PROTECTED] > wrote:

The original design docs say you can not build indexes on external tables but I tried it in 0.8.x and confirmed you can.

On Sunday, May 13, 2012, Ranjith <ranjith.raghunat [EMAIL PROTECTED] > wrote:
> Indexes can be built on tables managed by hive. For external tables I do not believe that to be true. Please feel to correct if I am wrong.
>
> Thanks,
> Ranjith
> On May 12, 2012, at 9:24 PM, Nanda Vijaydev < [EMAIL PROTECTED] > wrote:
>
> In hive, the raw data is in HDFS and there is a metadata layer that defines the structure of the raw data. Table is usually a reference to metadata, probably in a mySQL server and it contains a reference to the location of the data in HDFS, type of delimiter or serde to use and so on.
> 1. With hive managed tables, when you drop a table, both the metadata in mysql and raw data on the cluster gets deleted.
> 2. With external tables, when you drop a table, just the metadata gets deleted and the raw data continues to exist on the cluster.
>
> On Thu, May 10, 2012 at 3:02 PM, David Kulp < [EMAIL PROTECTED] > wrote:
>>
>> It's simpler than this. All files look the same -- and are often very simple delimited text -- whether managed or external. The only difference is that the files associated with a managed table are dropped when the table is dropped and files that are loaded into a managed table are moved into hive's private path. External tables never move or remove files. Performance is the same.
>>
>> On May 10, 2012, at 5:52 PM, [EMAIL PROTECTED] wrote:
>>
>> > I am pretty new to hive and was trying to clearly understand the difference between a managed and an external table.
>> >
>> > As my current understanding stands, a managed table is a table whose data is completely owned by hive whereas an external table is usually created to have a hive frontend for the data managed in external systems.I would suppose this would mean that a query on an external table goes out to fetch data from the given external table, deserialize according to the given/suitable SerDe and then show the output of the query in hive format.
>> >
>> > So does this mean that cost of using external tables is much higher than the native ones? Or is there some caching that comes into play that I am not seeing right now.
>> >
>> > Thanks for the help.
>> >
>> > --
>> > Swarnim
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB