Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> PriorityQueueWritable


Copy link to this message
-
Re: PriorityQueueWritable
Also, another advantage in trying to make use of the shuffle/sort is that
your sorted list can grow beyond the size of memory.  A risk in trying to
pack this data into a sorted ArrayWritable is that the list would grow too
large to fit in memory.

Thanks,
--Chris

On Mon, Oct 15, 2012 at 11:37 AM, Chris Nauroth <[EMAIL PROTECTED]>wrote:

> I think it would work, but I'm wondering if it would be easier for your
> application to restructure the keys emitted from the mapper tasks so that
> you can take advantage of the sorting inherently done during the shuffle.
>
> For each reduce task, your reducer code will receive keys emitted from
> mappers in sorted order.  Therefore, if the keys emitted from your mapper
> contain the item's priority, then the shuffle would provide the sort order
> that you need.  This might lead you down the path of writing a custom
> WritableComparable to use as the map output key, but this is usually pretty
> trivial.
>
> Also, keep in mind that if you run multiple reduce tasks, then each
> reducer receives a subset of the keys emitted from the mapper.  Depending
> on your application logic, this may or may not be a problem.
>
> Thanks,
> --Chris
>
>
> On Mon, Oct 15, 2012 at 11:07 AM, Aseem Anand <[EMAIL PROTECTED]>wrote:
>
>> Hi Chris,
>> I had a few PriorityQueue's at the mappers which I wished to send to some
>> reducers. After this each reducer(receiving PriorityQueues from each
>> mapper) would perform some operations on these by removing the top and
>> hence accessing the elements in sorted order(which is very essential to my
>> application). Even I thought of pushing them in an ArrayWritable but was
>> wondering if there would be an existing implementation of PriorityQueue.
>> Would it be advisable to insert elements into ArrayWritable in sorted
>> order and reconstruction of merged PriorityQueues at the other end now ?
>>
>> Thanks,
>> Aseem
>>
>>
>> On Mon, Oct 15, 2012 at 11:07 PM, Chris Nauroth <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Hello Aseem,
>>>
>>> I'm aware of nothing in Hadoop or related projects that provides a
>>> PriorityQueueWritable.  You could achieve this by taking some existing
>>> priority queue class and subclassing it or wrapping it to implement the
>>> Writable.write and Writable.readFields methods.
>>>
>>> If you could give us some additional context around what you want to
>>> solve, then we might be able to offer some other suggestions.  For example,
>>> depending on the problem, maybe you could sort values and wrap them in
>>> ArrayWritable (which already exists), which would save you the trouble of
>>> coding your own custom Writable.
>>>
>>> Thank you,
>>> --Chris
>>>
>>> On Mon, Oct 15, 2012 at 9:56 AM, Aseem Anand <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>> Is anyone familiar with a PriorityQueueWritable to be used to pass data
>>>> from mapper to reducers ?
>>>>
>>>> Regards,
>>>> Aseem
>>>>
>>>
>>>
>>
>