|
|
Ivan.Novick@... 2011-10-25, 18:49
Hello,
I am trying to understand how data locality works in hadoop.
If you run a map reduce job do the mappers only read data from the host on which they are running?
Is there a communication protocol between the map reduce layer and HDFS layer so that the mapper gets optimized to read data locally?
Any pointers on which layer of the stack handles this?
Cheers, Ivan
We pray to $deity that the mapreduce block size is about the same as (or smaller than) the hdfs block size. We also pray that file format synchronization points are frequent when compared to block boundaries.
The JobClient finds the location of each block of each file. It splits the job into FileSplit(s), with one per block.
Each FileSplit is processed by a task. The Split contains the locations in which the task should best be run.
The last block may be very short. It is then subsumed into the preceding block.
Some data is transferred between nodes when the synchronization point for the file format is not at a block boundary. (It basically never is, but we hope it's close, or the purpose of MR locality is defeated.)
Specifically to your questions: Most of the data should be read from the local hdfs node under the above assumptions. The communication layer between mapreduce and hdfs is not special.
S.
On 25 October 2011 11:49, <[EMAIL PROTECTED]> wrote:
> Hello, > > I am trying to understand how data locality works in hadoop. > > If you run a map reduce job do the mappers only read data from the host on > which they are running? > > Is there a communication protocol between the map reduce layer and HDFS > layer so that the mapper gets optimized to read data locally? > > Any pointers on which layer of the stack handles this? > > Cheers, > Ivan >
Ivan.Novick@... 2011-10-26, 00:36
So I guess the job tracker is the one reading the HDFS meta-data and then optimizing the scheduling of map jobs based on that? On 10/25/11 3:13 PM, "Shevek" <[EMAIL PROTECTED]> wrote:
>We pray to $deity that the mapreduce block size is about the same as (or >smaller than) the hdfs block size. We also pray that file format >synchronization points are frequent when compared to block boundaries. > >The JobClient finds the location of each block of each file. It splits the >job into FileSplit(s), with one per block. > >Each FileSplit is processed by a task. The Split contains the locations in >which the task should best be run. > >The last block may be very short. It is then subsumed into the preceding >block. > >Some data is transferred between nodes when the synchronization point for >the file format is not at a block boundary. (It basically never is, but we >hope it's close, or the purpose of MR locality is defeated.) > >Specifically to your questions: Most of the data should be read from the >local hdfs node under the above assumptions. The communication layer >between >mapreduce and hdfs is not special. > >S. > >On 25 October 2011 11:49, <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I am trying to understand how data locality works in hadoop. >> >> If you run a map reduce job do the mappers only read data from the host >>on >> which they are running? >> >> Is there a communication protocol between the map reduce layer and HDFS >> layer so that the mapper gets optimized to read data locally? >> >> Any pointers on which layer of the stack handles this? >> >> Cheers, >> Ivan >>
Mapred Learn 2011-10-26, 00:42
Yes that's right !
Sent from my iPhone
On Oct 25, 2011, at 5:36 PM, <[EMAIL PROTECTED]> wrote:
> So I guess the job tracker is the one reading the HDFS meta-data and then > optimizing the scheduling of map jobs based on that? > > > On 10/25/11 3:13 PM, "Shevek" <[EMAIL PROTECTED]> wrote: > >> We pray to $deity that the mapreduce block size is about the same as (or >> smaller than) the hdfs block size. We also pray that file format >> synchronization points are frequent when compared to block boundaries. >> >> The JobClient finds the location of each block of each file. It splits the >> job into FileSplit(s), with one per block. >> >> Each FileSplit is processed by a task. The Split contains the locations in >> which the task should best be run. >> >> The last block may be very short. It is then subsumed into the preceding >> block. >> >> Some data is transferred between nodes when the synchronization point for >> the file format is not at a block boundary. (It basically never is, but we >> hope it's close, or the purpose of MR locality is defeated.) >> >> Specifically to your questions: Most of the data should be read from the >> local hdfs node under the above assumptions. The communication layer >> between >> mapreduce and hdfs is not special. >> >> S. >> >> On 25 October 2011 11:49, <[EMAIL PROTECTED]> wrote: >> >>> Hello, >>> >>> I am trying to understand how data locality works in hadoop. >>> >>> If you run a map reduce job do the mappers only read data from the host >>> on >>> which they are running? >>> >>> Is there a communication protocol between the map reduce layer and HDFS >>> layer so that the mapper gets optimized to read data locally? >>> >>> Any pointers on which layer of the stack handles this? >>> >>> Cheers, >>> Ivan >>> >
Eugene Kirpichov 2011-10-26, 04:22
But I guess it isn't always possible to achieve optimal scheduling, right? What's done then; any account for network topology perhaps?
26.10.2011, в 4:42, Mapred Learn <[EMAIL PROTECTED]> написал(а):
> Yes that's right ! > > Sent from my iPhone > > On Oct 25, 2011, at 5:36 PM, <[EMAIL PROTECTED]> wrote: > >> So I guess the job tracker is the one reading the HDFS meta-data and then >> optimizing the scheduling of map jobs based on that? >> >> >> On 10/25/11 3:13 PM, "Shevek" <[EMAIL PROTECTED]> wrote: >> >>> We pray to $deity that the mapreduce block size is about the same as (or >>> smaller than) the hdfs block size. We also pray that file format >>> synchronization points are frequent when compared to block boundaries. >>> >>> The JobClient finds the location of each block of each file. It splits the >>> job into FileSplit(s), with one per block. >>> >>> Each FileSplit is processed by a task. The Split contains the locations in >>> which the task should best be run. >>> >>> The last block may be very short. It is then subsumed into the preceding >>> block. >>> >>> Some data is transferred between nodes when the synchronization point for >>> the file format is not at a block boundary. (It basically never is, but we >>> hope it's close, or the purpose of MR locality is defeated.) >>> >>> Specifically to your questions: Most of the data should be read from the >>> local hdfs node under the above assumptions. The communication layer >>> between >>> mapreduce and hdfs is not special. >>> >>> S. >>> >>> On 25 October 2011 11:49, <[EMAIL PROTECTED]> wrote: >>> >>>> Hello, >>>> >>>> I am trying to understand how data locality works in hadoop. >>>> >>>> If you run a map reduce job do the mappers only read data from the host >>>> on >>>> which they are running? >>>> >>>> Is there a communication protocol between the map reduce layer and HDFS >>>> layer so that the mapper gets optimized to read data locally? >>>> >>>> Any pointers on which layer of the stack handles this? >>>> >>>> Cheers, >>>> Ivan >>>> >>
Steve Loughran 2011-10-26, 09:14
On 26/10/11 05:22, Eugene Kirpichov wrote: > But I guess it isn't always possible to achieve optimal scheduling, right? > What's done then; any account for network topology perhaps? I'd recommend this paper if you are curious, it explains the Fair Scheduler http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdfYou can plug in different schedulers with different policies if you want
Eugene Kirpichov 2011-10-26, 09:22
Thanks! 2011/10/26 Steve Loughran <[EMAIL PROTECTED]>: > On 26/10/11 05:22, Eugene Kirpichov wrote: >> >> But I guess it isn't always possible to achieve optimal scheduling, right? >> What's done then; any account for network topology perhaps? > > I'd recommend this paper if you are curious, it explains the Fair Scheduler > > http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf> > You can plug in different schedulers with different policies if you want > -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/Editor, http://fprog.ru/
On 25 October 2011 17:36, <[EMAIL PROTECTED]> wrote:
> So I guess the job tracker is the one reading the HDFS meta-data and then > optimizing the scheduling of map jobs based on that? >
Currently no, it's the JobClient that does it. Although you use the term "scheduling" in a way which may confuse some other participants in the thread, since your original question wasn't about what Hadoop people usually call the "scheduler".
S. > On 10/25/11 3:13 PM, "Shevek" <[EMAIL PROTECTED]> wrote: > > >We pray to $deity that the mapreduce block size is about the same as (or > >smaller than) the hdfs block size. We also pray that file format > >synchronization points are frequent when compared to block boundaries. > > > >The JobClient finds the location of each block of each file. It splits the > >job into FileSplit(s), with one per block. > > > >Each FileSplit is processed by a task. The Split contains the locations in > >which the task should best be run. > > > >The last block may be very short. It is then subsumed into the preceding > >block. > > > >Some data is transferred between nodes when the synchronization point for > >the file format is not at a block boundary. (It basically never is, but we > >hope it's close, or the purpose of MR locality is defeated.) > > > >Specifically to your questions: Most of the data should be read from the > >local hdfs node under the above assumptions. The communication layer > >between > >mapreduce and hdfs is not special. > > > >S. > > > >On 25 October 2011 11:49, <[EMAIL PROTECTED]> wrote: > > > >> Hello, > >> > >> I am trying to understand how data locality works in hadoop. > >> > >> If you run a map reduce job do the mappers only read data from the host > >>on > >> which they are running? > >> > >> Is there a communication protocol between the map reduce layer and HDFS > >> layer so that the mapper gets optimized to read data locally? > >> > >> Any pointers on which layer of the stack handles this? > >> > >> Cheers, > >> Ivan > >> > >
Michel Segel 2011-10-28, 09:24
Umm... You can override the scheduler and force it to ignore data locality. We did this as an experiment to better balance cluster CPU utilization. It wasn't pretty and we started to look at the delay scheduler... Wouldn't recommend it though.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Oct 26, 2011, at 12:45 PM, Shevek <[EMAIL PROTECTED]> wrote:
> On 25 October 2011 17:36, <[EMAIL PROTECTED]> wrote: > >> So I guess the job tracker is the one reading the HDFS meta-data and then >> optimizing the scheduling of map jobs based on that? >> > > Currently no, it's the JobClient that does it. Although you use the term > "scheduling" in a way which may confuse some other participants in the > thread, since your original question wasn't about what Hadoop people usually > call the "scheduler". > > S. > > >> On 10/25/11 3:13 PM, "Shevek" <[EMAIL PROTECTED]> wrote: >> >>> We pray to $deity that the mapreduce block size is about the same as (or >>> smaller than) the hdfs block size. We also pray that file format >>> synchronization points are frequent when compared to block boundaries. >>> >>> The JobClient finds the location of each block of each file. It splits the >>> job into FileSplit(s), with one per block. >>> >>> Each FileSplit is processed by a task. The Split contains the locations in >>> which the task should best be run. >>> >>> The last block may be very short. It is then subsumed into the preceding >>> block. >>> >>> Some data is transferred between nodes when the synchronization point for >>> the file format is not at a block boundary. (It basically never is, but we >>> hope it's close, or the purpose of MR locality is defeated.) >>> >>> Specifically to your questions: Most of the data should be read from the >>> local hdfs node under the above assumptions. The communication layer >>> between >>> mapreduce and hdfs is not special. >>> >>> S. >>> >>> On 25 October 2011 11:49, <[EMAIL PROTECTED]> wrote: >>> >>>> Hello, >>>> >>>> I am trying to understand how data locality works in hadoop. >>>> >>>> If you run a map reduce job do the mappers only read data from the host >>>> on >>>> which they are running? >>>> >>>> Is there a communication protocol between the map reduce layer and HDFS >>>> layer so that the mapper gets optimized to read data locally? >>>> >>>> Any pointers on which layer of the stack handles this? >>>> >>>> Cheers, >>>> Ivan >>>> >> >>
|
|