Vincent BARAT 2009-10-07, 14:54
Ashutosh Chauhan 2009-10-07, 16:33
zaki rahaman 2009-10-07, 20:08
Thejas Nair 2009-10-07, 20:16
Vincent BARAT 2009-10-08, 13:33
Alan Gates 2009-10-12, 18:50
-Re: storing intermediate results ?
Vincent BARAT 2009-10-08, 09:43
Thank for your answer.
Actually, I use PIG by running it from Java (using a set of
registerQuery() methods). The exec you mention cannot be used in
that context (AFAIK).
Ashutosh Chauhan a ï¿½crit :
> Hi Vincent,
> Pig has a multi-query optimization which if firing will automatically figure
> out that join needs to be done only once and there will not be any
> repetition of work. If Pig determines that its not safe to do that
> optimization then its possible that your join is getting computed more then
> once. If thats the case, then it will be better to do the join and store it.
> You can do that within same script using "exec"
> You can read more about multi-query optimization here:
> Hope it helps,
> On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <[EMAIL PROTECTED]>wrote:
>> I'm new to PIG, and I have a bunch of statements that process the same
>> input, which is actually the result of a JOIN between two very big data set
>> (millions of entries).
>> I wonder if it is better (faster) to save the result of this JOIN into an
>> Hadoop file and then to LOAD it, instead of just relying on PIG
>> optimizations ?
>> Thank a lot for your help.