|
|
-
Is it necessary to cache metadata in client side?
Jeff Zhang 2010-06-11, 06:43
Hi all,
According the GFS paper claims, GFS will cache meta data in client. But when I check the source code of hadoop, it seems that hadoop won't cache it in client side. I just wan to make sure whether I am right ? And wondering whether there's someone work on it ? One advantage of caching metadata in client side I can think of is that tasktracker will fetch job.xml in HDFS. And most of time we will run multiple task in one node, so if tasktrack cache the metadata, it can reduce the communication with namenode.
-- Best Regards
Jeff Zhang
-
Re: Is it necessary to cache metadata in client side?
Todd Lipcon 2010-06-11, 09:02
It is cached per input stream - see DFSInputStream.locatedBlocks, prefetchSize, etc.
-Todd
On Thu, Jun 10, 2010 at 11:43 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
> Hi all, > > According the GFS paper claims, GFS will cache meta data in client. > But when I check the source code of hadoop, it seems that hadoop won't > cache it in client side. I just wan to make sure whether I am right ? > And wondering whether there's someone work on it ? One advantage of > caching metadata in client side I can think of is that tasktracker > will fetch job.xml in HDFS. And most of time we will run multiple task > in one node, so if tasktrack cache the metadata, it can reduce the > communication with namenode. > > > > -- > Best Regards > > Jeff Zhang >
-- Todd Lipcon Software Engineer, Cloudera
-
Re: Is it necessary to cache metadata in client side?
Jeff Zhang 2010-06-11, 09:17
Per inputstream means the cache can only been used in the scope of one file. I think it's will be better if there's a cache in DFSClient.
On Fri, Jun 11, 2010 at 5:02 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > It is cached per input stream - see DFSInputStream.locatedBlocks, > prefetchSize, etc. > > -Todd > On Thu, Jun 10, 2010 at 11:43 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote: >> >> Hi all, >> >> According the GFS paper claims, GFS will cache meta data in client. >> But when I check the source code of hadoop, it seems that hadoop won't >> cache it in client side. I just wan to make sure whether I am right ? >> And wondering whether there's someone work on it ? One advantage of >> caching metadata in client side I can think of is that tasktracker >> will fetch job.xml in HDFS. And most of time we will run multiple task >> in one node, so if tasktrack cache the metadata, it can reduce the >> communication with namenode. >> >> >> >> -- >> Best Regards >> >> Jeff Zhang > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
-- Best Regards
Jeff Zhang
|
|