Chris Nauroth 2012-10-15, 18:39
Also, another advantage in trying to make use of the shuffle/sort is that
your sorted list can grow beyond the size of memory. A risk in trying to
pack this data into a sorted ArrayWritable is that the list would grow too
large to fit in memory.
On Mon, Oct 15, 2012 at 11:37 AM, Chris Nauroth <[EMAIL PROTECTED]>wrote:
> I think it would work, but I'm wondering if it would be easier for your
> application to restructure the keys emitted from the mapper tasks so that
> you can take advantage of the sorting inherently done during the shuffle.
> For each reduce task, your reducer code will receive keys emitted from
> mappers in sorted order. Therefore, if the keys emitted from your mapper
> contain the item's priority, then the shuffle would provide the sort order
> that you need. This might lead you down the path of writing a custom
> WritableComparable to use as the map output key, but this is usually pretty
> Also, keep in mind that if you run multiple reduce tasks, then each
> reducer receives a subset of the keys emitted from the mapper. Depending
> on your application logic, this may or may not be a problem.
> On Mon, Oct 15, 2012 at 11:07 AM, Aseem Anand <[EMAIL PROTECTED]>wrote:
>> Hi Chris,
>> I had a few PriorityQueue's at the mappers which I wished to send to some
>> reducers. After this each reducer(receiving PriorityQueues from each
>> mapper) would perform some operations on these by removing the top and
>> hence accessing the elements in sorted order(which is very essential to my
>> application). Even I thought of pushing them in an ArrayWritable but was
>> wondering if there would be an existing implementation of PriorityQueue.
>> Would it be advisable to insert elements into ArrayWritable in sorted
>> order and reconstruction of merged PriorityQueues at the other end now ?
>> On Mon, Oct 15, 2012 at 11:07 PM, Chris Nauroth <[EMAIL PROTECTED]
>> > wrote:
>>> Hello Aseem,
>>> I'm aware of nothing in Hadoop or related projects that provides a
>>> PriorityQueueWritable. You could achieve this by taking some existing
>>> priority queue class and subclassing it or wrapping it to implement the
>>> Writable.write and Writable.readFields methods.
>>> If you could give us some additional context around what you want to
>>> solve, then we might be able to offer some other suggestions. For example,
>>> depending on the problem, maybe you could sort values and wrap them in
>>> ArrayWritable (which already exists), which would save you the trouble of
>>> coding your own custom Writable.
>>> Thank you,
>>> On Mon, Oct 15, 2012 at 9:56 AM, Aseem Anand <[EMAIL PROTECTED]>wrote:
>>>> Is anyone familiar with a PriorityQueueWritable to be used to pass data
>>>> from mapper to reducers ?