|
|
Harsh J 2012-12-26, 08:49
Hi,
What do you mean by "shuffled bytes [to] the mappers"? If you mean "from", it is "Reduce shuffle bytes" you look for; otherwise, you may be looking for the per-map counter of "Map output bytes".
Per-partition counters can be constructed on the user side if needed, by pre-computing the partition before emit (using the same partitioner) and counting up the bytes of your objects for its counter.
On Tue, Dec 25, 2012 at 6:03 PM, Eduard Skaley <[EMAIL PROTECTED]> wrote: > Hello guys, > > I need a counter for shuffled bytes to the mappers. > Is there existing one or should I define one myself ? > How can I implement such a counter? > > Thank you and happy Christmas time, > Eduard
-- Harsh J
+
Harsh J 2012-12-26, 08:49
Eduard Skaley 2012-12-26, 11:46
Hi,
I mean TO the mappers. I'm using the CompositeInputFormat for my application to compute map-side joins. I want to join two datasets A and B one is stored on node 1 and the other one on node 2. For example if the join will be computed on node 2 then the inputsplit of the dataset which is stored on node 1 has to be transferred to node 2. I want to count the bytes which are shuffled (transferred) TO the mapper of node 2. > Hi, > > What do you mean by "shuffled bytes [to] the mappers"? If you mean > "from", it is "Reduce shuffle bytes" you look for; otherwise, you may > be looking for the per-map counter of "Map output bytes". > > Per-partition counters can be constructed on the user side if needed, > by pre-computing the partition before emit (using the same > partitioner) and counting up the bytes of your objects for its > counter. > > On Tue, Dec 25, 2012 at 6:03 PM, Eduard Skaley <[EMAIL PROTECTED]> wrote: >> Hello guys, >> >> I need a counter for shuffled bytes to the mappers. >> Is there existing one or should I define one myself ? >> How can I implement such a counter? >> >> Thank you and happy Christmas time, >> Eduard > >
+
Eduard Skaley 2012-12-26, 11:46
Harsh J 2012-12-26, 11:50
This isn't called 'shuffle' (but rather a plain remote read) so your original question was confusing, thanks for clarifying!
In that case, you could count the bytes coming in from the required record reader - for example a TextRecordReader uses a Long key that denotes current offset in file, which you could use as a simple, progressing counter of bytes read thus far.
On Wed, Dec 26, 2012 at 5:16 PM, Eduard Skaley <[EMAIL PROTECTED]> wrote: > Hi, > > I mean TO the mappers. I'm using the CompositeInputFormat for my application > to compute map-side joins. > I want to join two datasets A and B one is stored on node 1 and the other > one on node 2. > For example if the join will be computed on node 2 then the inputsplit of > the dataset which is stored on node 1 has to be transferred to node 2. > I want to count the bytes which are shuffled (transferred) TO the mapper of > node 2. > >> Hi, >> >> What do you mean by "shuffled bytes [to] the mappers"? If you mean >> "from", it is "Reduce shuffle bytes" you look for; otherwise, you may >> be looking for the per-map counter of "Map output bytes". >> >> Per-partition counters can be constructed on the user side if needed, >> by pre-computing the partition before emit (using the same >> partitioner) and counting up the bytes of your objects for its >> counter. >> >> On Tue, Dec 25, 2012 at 6:03 PM, Eduard Skaley <[EMAIL PROTECTED]> >> wrote: >>> >>> Hello guys, >>> >>> I need a counter for shuffled bytes to the mappers. >>> Is there existing one or should I define one myself ? >>> How can I implement such a counter? >>> >>> Thank you and happy Christmas time, >>> Eduard >> >> >> >
-- Harsh J
+
Harsh J 2012-12-26, 11:50
Eduard Skaley 2012-12-26, 12:56
For this I need to know where an inputsplit is located. And where a join is computed. How can I do this programmatically ? > This isn't called 'shuffle' (but rather a plain remote read) so your > original question was confusing, thanks for clarifying! > > In that case, you could count the bytes coming in from the required > record reader - for example a TextRecordReader uses a Long key that > denotes current offset in file, which you could use as a simple, > progressing counter of bytes read thus far. > > On Wed, Dec 26, 2012 at 5:16 PM, Eduard Skaley <[EMAIL PROTECTED]> wrote: >> Hi, >> >> I mean TO the mappers. I'm using the CompositeInputFormat for my application >> to compute map-side joins. >> I want to join two datasets A and B one is stored on node 1 and the other >> one on node 2. >> For example if the join will be computed on node 2 then the inputsplit of >> the dataset which is stored on node 1 has to be transferred to node 2. >> I want to count the bytes which are shuffled (transferred) TO the mapper of >> node 2. >> >>> Hi, >>> >>> What do you mean by "shuffled bytes [to] the mappers"? If you mean >>> "from", it is "Reduce shuffle bytes" you look for; otherwise, you may >>> be looking for the per-map counter of "Map output bytes". >>> >>> Per-partition counters can be constructed on the user side if needed, >>> by pre-computing the partition before emit (using the same >>> partitioner) and counting up the bytes of your objects for its >>> counter. >>> >>> On Tue, Dec 25, 2012 at 6:03 PM, Eduard Skaley <[EMAIL PROTECTED]> >>> wrote: >>>> Hello guys, >>>> >>>> I need a counter for shuffled bytes to the mappers. >>>> Is there existing one or should I define one myself ? >>>> How can I implement such a counter? >>>> >>>> Thank you and happy Christmas time, >>>> Eduard >>> >>> > >
+
Eduard Skaley 2012-12-26, 12:56
|
|