Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive scratch dir not cleaning up


Copy link to this message
-
Re: Hive scratch dir not cleaning up
Forgot the link.

github.com/edwardcapriolo/filecrush

On 6/1/12, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> The filecrush tool has a small utility called Clean that accepts and
> age argument and deletes all the files in a directory older then a
> certain time.
>
> We use clean to clean up the tmp hdfs directories applications leave
> remnants in.
>
> Edward
>
> On 6/1/12, Vinod Singh <[EMAIL PROTECTED]> wrote:
>> Yes, that is how I do. Though 1 month is too long, I keep it just 2 days.
>>
>> Thanks,
>> Vinod
>>
>> http://blog.vinodsingh.com/
>>
>> On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries
>> <[EMAIL PROTECTED]>wrote:
>>
>>> So I should write a job which cleans up 1 month old results or something
>>> like that?
>>>
>>> From: Vinod Singh [mailto:[EMAIL PROTECTED]]
>>> Sent: Friday, June 01, 2012 10:35 AM
>>> To: [EMAIL PROTECTED]
>>> Subject: Re: Hive scratch dir not cleaning up
>>>
>>> Hive deletes job contents from the scratch directory on completion of
>>> the
>>> job. Though failed / killed jobs leave data there, which needs to be
>>> removed manually.
>>>
>>> Thanks,
>>> Vinod
>>>
>>> http://blog.vinodsingh.com/
>>> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <[EMAIL PROTECTED]>
>>> wrote:
>>> Hey Hivers,
>>>
>>> I’m almost ready to replace our old hadoop implementation with a
>>> implementation using Hive,
>>>
>>> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir
>>> is
>>> getting kinda big!
>>> It doesn’t seem to cleanup this tmp files, googling for it I run into
>>> some
>>> tickets about a cleanup setting, should I enable this with the below
>>> setting?
>>> Why doesn’t it do that by default? Am I the only one somehow racking up
>>> a
>>> lot of space with tmp files?
>>>
>>>
>>>
>>>
>>> <property>
>>>   <name>hive.start.cleanup.scratchdir</name>
>>>   <value>true</value>
>>> </property>
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB