Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> removing hdfs table data directory does not throw error in hive


Copy link to this message
-
Re: removing hdfs table data directory does not throw error in hive
looks like a good use case

created and improvement request

https://issues.apache.org/jira/browse/HIVE-2980

On Tue, Apr 24, 2012 at 9:10 AM, Sukhendu Chakraborty <
[EMAIL PROTECTED]> wrote:

> Thanks Nitin. I am aware of what Hive is doing. The question is, is it
> okay not return an error/warning when no data is found since the
> metadata for the table also contains the data location when you create
> the table (which creates the hdfs directory as well). So, if somebody
> erroneously removes  the hive directory corr. to the table, atleast a
> warning on select might be a good idea.
>
> -Sukhendu
>
> On Mon, Apr 23, 2012 at 8:28 PM, Nitin Pawar <[EMAIL PROTECTED]>
> wrote:
> > hive table meta data is stored into a meta data store which will retain
> the
> > table structure and other meta info even if you delete hdfs table
> directory
> > as its stored in metadata store db.
> >
> > When you do a select * from table;
> > 1) hive checks for table exists in metadata store
> > 2) if table is existing then check the location of data
> > 3) if data is available in the location process the data else return OK
> > without doing anything
> >
> > It is not an error case because hive job did not fail.
> >
> >
> > On Tue, Apr 24, 2012 at 6:25 AM, Sukhendu Chakraborty
> > <[EMAIL PROTECTED]> wrote:
> >>
> >> I have a hive table tab3 with two columns (c1 int, c2 int)
> >>
> >> hive> load data local inpath '/tmp/orhc466fb981' into table tab3;
> >> Copying data from file:/tmp/orhc466fb981
> >> Copying file: file:/tmp/orhc466fb981
> >> Loading data to table default.tab3
> >> OK
> >> Time taken: 3.907 seconds
> >> hive> select * from tab3;
> >> OK
> >> 4       2
> >> 4       10
> >> 7       4
> >> 7       22
> >> .....
> >> //remove the tab3 directory from hdfs
> >> [schakrab@diy-1-2 orch]$ hadoop fs -rmr /user/hive/warehouse/tab3;
> >> Deleted hdfs://localhost:9000/user/hive/warehouse/tab3
> >> [schakrab@diy-1-2 orch]$ hive
> >> Hive history
> >> file=/tmp/schakrab/hive_job_log_schakrab_201204231748_1985146177.txt
> >> //no error thrown!
> >> hive> select * from tab3;
> >> OK
> >> Time taken: 3.68 seconds
> >> // of course. metadata still exists.
> >> hive> desc tab3;
> >> OK
> >> c1      int
> >> c2      int
> >> Time taken: 0.127 seconds
> >>
> >> // doing another load recreates the directory tab3
> >>
> >> Shouldn't the select * query return an error when the underlying table
> >> file is removed ?
> >>
> >> -Sukhendu
> >
> >
> >
> >
> > --
> > Nitin Pawar
> >
>

--
Nitin Pawar