|
|
-
HDFS - MapReduce coupling
Matthew John 2011-05-02, 06:48
Hi all,
1) I wanted to know how strong the coupling between HDFS and MapReduce (programming abstraction) in Hadoop is. Can someone throw some light on the protocols used between HDFS and JobTracker/TaskTracker/Namenode interactions. Any pointer on this would be of great help!
2) Does the Hadoop system utilize the local storage directly for any purpose (without going through the HDFS) in clustered mode?
Thanks, Matthew John
+
Matthew John 2011-05-02, 06:48
-
Re: HDFS - MapReduce coupling
Ted Dunning 2011-05-02, 06:56
Yes. There is quite a bit of need for the local file system in clustered mode.
For one think, all of the shuffle intermediate files are on local disk. For another, the distributed cache is actually stored on local disk.
HFDS is a frail vessel that cannot cope with all the needs.
On Sun, May 1, 2011 at 11:48 PM, Matthew John <[EMAIL PROTECTED]>wrote:
> ... > 2) Does the Hadoop system utilize the local storage directly for any > purpose > (without going through the HDFS) in clustered mode? > >
+
Ted Dunning 2011-05-02, 06:56
-
Re: HDFS - MapReduce coupling
Matthew John 2011-05-02, 07:16
Any documentations on how the different daemons do the write/read on HDFS and Local File System (direct), I mean the different protocols used in the interactions. I basically wanted to figure out how intricate the coupling between the Storage (HDFS + Local) and other processes in the Hadoop infrastructure is. On Mon, May 2, 2011 at 12:26 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Yes. There is quite a bit of need for the local file system in clustered > mode. > > For one think, all of the shuffle intermediate files are on local disk. > For > another, the distributed cache is actually stored on local disk. > > HFDS is a frail vessel that cannot cope with all the needs. > > On Sun, May 1, 2011 at 11:48 PM, Matthew John <[EMAIL PROTECTED] > >wrote: > > > ... > > 2) Does the Hadoop system utilize the local storage directly for any > > purpose > > (without going through the HDFS) in clustered mode? > > > > >
+
Matthew John 2011-05-02, 07:16
-
Re: HDFS - MapReduce coupling
Matthew John 2011-05-02, 11:23
someone kindly give some pointers on this!!
On Mon, May 2, 2011 at 12:46 PM, Matthew John <[EMAIL PROTECTED]>wrote:
> Any documentations on how the different daemons do the write/read on HDFS > and Local File System (direct), I mean the different protocols used in the > interactions. I basically wanted to figure out how intricate the coupling > between the Storage (HDFS + Local) and other processes in the Hadoop > infrastructure is. > > > > On Mon, May 2, 2011 at 12:26 PM, Ted Dunning <[EMAIL PROTECTED]>wrote: > >> Yes. There is quite a bit of need for the local file system in clustered >> mode. >> >> For one think, all of the shuffle intermediate files are on local disk. >> For >> another, the distributed cache is actually stored on local disk. >> >> HFDS is a frail vessel that cannot cope with all the needs. >> >> On Sun, May 1, 2011 at 11:48 PM, Matthew John <[EMAIL PROTECTED] >> >wrote: >> >> > ... >> > 2) Does the Hadoop system utilize the local storage directly for any >> > purpose >> > (without going through the HDFS) in clustered mode? >> > >> > >> > >
+
Matthew John 2011-05-02, 11:23
-
Re: HDFS - MapReduce coupling
James Seigel 2011-05-02, 11:32
If you are pressed for time, you could look at the source code. I believe a huge proportion of the people that could answer your question ( and it isn't a small one ) are sleeping right now. :)
Source code is probably your best answer.
James
Sent from my mobile. Please excuse the typos.
On 2011-05-02, at 5:23 AM, Matthew John <[EMAIL PROTECTED]> wrote:
> someone kindly give some pointers on this!! > > On Mon, May 2, 2011 at 12:46 PM, Matthew John <[EMAIL PROTECTED]>wrote: > >> Any documentations on how the different daemons do the write/read on HDFS >> and Local File System (direct), I mean the different protocols used in the >> interactions. I basically wanted to figure out how intricate the coupling >> between the Storage (HDFS + Local) and other processes in the Hadoop >> infrastructure is. >> >> >> >> On Mon, May 2, 2011 at 12:26 PM, Ted Dunning <[EMAIL PROTECTED]>wrote: >> >>> Yes. There is quite a bit of need for the local file system in clustered >>> mode. >>> >>> For one think, all of the shuffle intermediate files are on local disk. >>> For >>> another, the distributed cache is actually stored on local disk. >>> >>> HFDS is a frail vessel that cannot cope with all the needs. >>> >>> On Sun, May 1, 2011 at 11:48 PM, Matthew John <[EMAIL PROTECTED] >>>> wrote: >>> >>>> ... >>>> 2) Does the Hadoop system utilize the local storage directly for any >>>> purpose >>>> (without going through the HDFS) in clustered mode? >>>> >>>> >>> >> >>
+
James Seigel 2011-05-02, 11:32
|
|