Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Implement Binary Search in PIG

Copy link to this message
Re: Implement Binary Search in PIG
Bags can be very large might not fit into memory, and in such cases some
or all of the bag might have to be stored on disk. In such cases, it is
not efficient to do random access on the bag. That is why the DataBag
interface does not support it.

As Prashant suggested, storing it in a tuple would be a good
alternative, if you want to have random access to do binary search.

On 12/12/11 7:54 PM, 唐亮 wrote:
> Hi all,
> How can I implement a binary search in pig?
> In one relation, there exists a bag whose items are sorted.
> And I want to check there exists a specific item in the bag.
> In UDF, I can't random access items in DataBag container.
> So I have to transfer the items in DataBag to an ArrayList, and this is
> time consuming.
> How can I implement the binary search efficiently in pig?