Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> distributed cache


Copy link to this message
-
Re: Re: Re: distributed cache
when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you !
 
2012-11-16
发件人: yingnan.ma
发送时间: 2012-11-15  11:48:04
收件人: user
抄送:
主题: Re: Re: distributed cache
 
Thank you so much! Both Replicated join and UDF to use
distributed cache are useful for me, I am already done it , Thank you again.
2012-11-15
yingnan.ma
发件人: Prashant Kommireddi
发送时间: 2012-11-15  03:52:09
收件人: [EMAIL PROTECTED]
抄送:
主题: Re: distributed cache

If it's for purposes other than a Join, you could write a UDF to use
distributed cache. Look at the section "Loading the Distributed Cache"
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html
On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:
> Maybe this is what you are looking for:
> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
> see "Replicated join"
>
>
> On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <[EMAIL PROTECTED]>
> wrote:
>
> > Hi ,
> >
> > I used the distributed cache in the hadoop though the "setup" and
> "static"
> > store an hashset in the mem;
> >
> > and I try to use the distributed cache in the Pig, and I don't know how
> to
> > store an hashset in the mem,I just can cache the file in the mem.
> >
> > Any advise would be fine, Thank you so much!
> >
> > Best Regards
> >
> > Malone
> >
> > 2012-11-13
> >
> >
> >
>