Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> where distributed cache start working


Copy link to this message
-
Re: where distributed cache start working
Hi,
> Thanks Arun. Change the mTime is a good idea. However, given a file (the path is
>
> A/B/C/D/file) distributed to all the nodes, if I just change the mTime of file
> to a earlier time stamp, it will not be replaced next time. Should I also change
> the mTime for all the directories along the path (A, B, C and D). Whose
> timestamp is used by DistributedCache?

It is the timestamp of the file on DFS. So, you modify the file's
timestamp on DFS, it should be re-distributed to all the nodes.

Thanks
Hemanth
>
> Thanks.
> -Gang
>
>
>
>
> ----- 原始邮件 ----
> 发件人: Arun C Murthy <[EMAIL PROTECTED]>
> 收件人: [EMAIL PROTECTED]
> 发送日期: 2010/8/22 (周日) 9:38:02 下�
�> 主   题: Re: where distributed cache start working
>
> Moving to mapreduce-user@, bcc common-dev@. Please use the project specific
> lists.
>
> DistributedCache.purgeCache isn't a public api. You shouldn't be calling it from
>
> the task.
>
> A simple way of doing what you want is to change the mtime of the cache files on
>
> HDFS.
>
> Arun
>
> On Aug 22, 2010, at 9:48 AM, Gang Luo wrote:
>
>> Thanks Jeff.
>>
>> However, are you sure TaskRunner.run() is also used in the new API? I use
>>btrace
>> to trace the function call but didn't find this function had been called
>> anywhere.
>>
>>
>> One more question about distributed cache. After I call
>> DistributedCache.purgeCache, I think the local cached files should be deleted
>>or
>> invalidated. However ,When I run the same job with the purge operation at the
>> end multiple times, I find the local files have never been deleted and the
>> modification time is when the first job run. How can I ask my job to
>> re-distributed the cache again anyway?
>>
>> Thanks,
>> -Gang
>>
>>
>>
>>
>> ----- 原始邮件 ----
>> 发件人: Jeff Zhang <[EMAIL PROTECTED]>
>> 收件人: [EMAIL PROTECTED]
>> 发送日期: 2010/8/20 (周五) 11:22:49 上午
>> 主   题: Re: where distributed cache start working
>>
>> Hi Gang,
>>
>> In the TaskRunner's run() method, hadoop will download the cache files
>> which you set on the client side to local, then the forked child jvm
>> can use these cache files locally.
>>
>>
>>
>> On Fri, Aug 20, 2010 at 8:08 AM, Gang Luo <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>> I go through the code, but couldn't find the place where distributed cache
>>> start
>>> working. I want to know between DistriubtedCache.addCacheFile at the master
>>> node
>>> and DistributedCache.getLocalCacheFiles at the client side, when and where
> are
>>> the files get distributed.
>>>
>>>
>>> Thanks,
>>> -Gang
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>>
>
>
>
>
>