|
|
+
Vincent BARAT 2009-10-07, 14:54
+
Ashutosh Chauhan 2009-10-07, 16:33
+
zaki rahaman 2009-10-07, 20:08
+
Thejas Nair 2009-10-07, 20:16
+
Vincent BARAT 2009-10-08, 13:33
+
Alan Gates 2009-10-12, 18:50
-
Re: storing intermediate results ?Vincent BARAT 2009-10-08, 09:43
Hello,
Thank for your answer. Actually, I use PIG by running it from Java (using a set of registerQuery() methods). The exec you mention cannot be used in that context (AFAIK). Ashutosh Chauhan a �crit : > Hi Vincent, > > Pig has a multi-query optimization which if firing will automatically figure > out that join needs to be done only once and there will not be any > repetition of work. If Pig determines that its not safe to do that > optimization then its possible that your join is getting computed more then > once. If thats the case, then it will be better to do the join and store it. > You can do that within same script using "exec" > http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#exec > > You can read more about multi-query optimization here: > http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#Multi-Query+Execution > > Hope it helps, > Ashutosh > > On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <[EMAIL PROTECTED]>wrote: > >> Hello, >> >> I'm new to PIG, and I have a bunch of statements that process the same >> input, which is actually the result of a JOIN between two very big data set >> (millions of entries). >> >> I wonder if it is better (faster) to save the result of this JOIN into an >> Hadoop file and then to LOAD it, instead of just relying on PIG >> optimizations ? >> >> Thank a lot for your help. >> > |