Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Implement Binary Search in PIG


Copy link to this message
-
Re: Implement Binary Search in PIG
My assumption is that 唐亮 is trying to do binary search on bags within
the tuples in a relation (ie schema of the relation has a bag column). I
don't think he is trying to treat the entire relation as one bag and do
binary search on that.
-Thejas
On 12/13/11 2:30 PM, Andrew Wells wrote:
> I don't think this could be done,
>
> pig is just a hadoop job, and the idea behind hadoop is to read all the
> data in a file.
>
> so by the time you put all the data into an array, you would have been
> better off just checking each element for the one you were looking for.
>
> So what you would get is [n + lg (n)], which will just be [n] after putting
> that into an array.
> Second, hadoop is all about large data analysis, usually more than 100GB,
> so putting this into memory is out of the question.
> Third, hadoop is efficient because it processes this large amount of data
> by splitting it up into multiple processes. To do an efficient binary
> search, you would need do this in one mapper or one reducer.
>
> My opinion is just don't fight hadoop/pig.
>
>
>
> On Tue, Dec 13, 2011 at 1:56 PM, Thejas Nair<[EMAIL PROTECTED]>  wrote:
>
>> Bags can be very large might not fit into memory, and in such cases some
>> or all of the bag might have to be stored on disk. In such cases, it is not
>> efficient to do random access on the bag. That is why the DataBag interface
>> does not support it.
>>
>> As Prashant suggested, storing it in a tuple would be a good alternative,
>> if you want to have random access to do binary search.
>>
>> -Thejas
>>
>>
>>
>> On 12/12/11 7:54 PM, 唐亮 wrote:
>>
>>> Hi all,
>>> How can I implement a binary search in pig?
>>>
>>> In one relation, there exists a bag whose items are sorted.
>>> And I want to check there exists a specific item in the bag.
>>>
>>> In UDF, I can't random access items in DataBag container.
>>> So I have to transfer the items in DataBag to an ArrayList, and this is
>>> time consuming.
>>>
>>> How can I implement the binary search efficiently in pig?
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB