Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - removing hdfs table data directory does not throw error in hive


Copy link to this message
-
Re: removing hdfs table data directory does not throw error in hive
Sukhendu Chakraborty 2012-04-24, 03:40
Thanks Nitin. I am aware of what Hive is doing. The question is, is it
okay not return an error/warning when no data is found since the
metadata for the table also contains the data location when you create
the table (which creates the hdfs directory as well). So, if somebody
erroneously removes  the hive directory corr. to the table, atleast a
warning on select might be a good idea.

-Sukhendu

On Mon, Apr 23, 2012 at 8:28 PM, Nitin Pawar <[EMAIL PROTECTED]> wrote:
> hive table meta data is stored into a meta data store which will retain the
> table structure and other meta info even if you delete hdfs table directory
> as its stored in metadata store db.
>
> When you do a select * from table;
> 1) hive checks for table exists in metadata store
> 2) if table is existing then check the location of data
> 3) if data is available in the location process the data else return OK
> without doing anything
>
> It is not an error case because hive job did not fail.
>
>
> On Tue, Apr 24, 2012 at 6:25 AM, Sukhendu Chakraborty
> <[EMAIL PROTECTED]> wrote:
>>
>> I have a hive table tab3 with two columns (c1 int, c2 int)
>>
>> hive> load data local inpath '/tmp/orhc466fb981' into table tab3;
>> Copying data from file:/tmp/orhc466fb981
>> Copying file: file:/tmp/orhc466fb981
>> Loading data to table default.tab3
>> OK
>> Time taken: 3.907 seconds
>> hive> select * from tab3;
>> OK
>> 4       2
>> 4       10
>> 7       4
>> 7       22
>> .....
>> //remove the tab3 directory from hdfs
>> [schakrab@diy-1-2 orch]$ hadoop fs -rmr /user/hive/warehouse/tab3;
>> Deleted hdfs://localhost:9000/user/hive/warehouse/tab3
>> [schakrab@diy-1-2 orch]$ hive
>> Hive history
>> file=/tmp/schakrab/hive_job_log_schakrab_201204231748_1985146177.txt
>> //no error thrown!
>> hive> select * from tab3;
>> OK
>> Time taken: 3.68 seconds
>> // of course. metadata still exists.
>> hive> desc tab3;
>> OK
>> c1      int
>> c2      int
>> Time taken: 0.127 seconds
>>
>> // doing another load recreates the directory tab3
>>
>> Shouldn't the select * query return an error when the underlying table
>> file is removed ?
>>
>> -Sukhendu
>
>
>
>
> --
> Nitin Pawar
>