when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you !
发送时间： 2012-11-15 11:48:04
主题： Re: Re: distributed cache
Thank you so much! Both Replicated join and UDF to use
distributed cache are useful for me, I am already done it , Thank you again.
发件人： Prashant Kommireddi
发送时间： 2012-11-15 03:52:09
收件人： [EMAIL PROTECTED]
主题： Re: distributed cache
If it's for purposes other than a Join, you could write a UDF to use
distributed cache. Look at the section "Loading the Distributed Cache"
On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:
> Maybe this is what you are looking for:
> see "Replicated join"
> On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <[EMAIL PROTECTED]>
> > Hi ,
> > I used the distributed cache in the hadoop though the "setup" and
> > store an hashset in the mem;
> > and I try to use the distributed cache in the Pig, and I don't know how
> > store an hashset in the mem,I just can cache the file in the mem.
> > Any advise would be fine, Thank you so much!
> > Best Regards
> > Malone
> > 2012-11-13