Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - reuse same Tuple and ArrayList for every getNext call in LoadFunc?


+
Jim Donofrio 2012-09-17, 04:33
Copy link to this message
-
Re: reuse same Tuple and ArrayList for every getNext call in LoadFunc?
Dmitriy Ryaboy 2012-09-17, 04:44
I looked into this a while back -- trouble comes when something
downstream from the loader tries to collect inputs into a bag, and
doesn't do its own copies. One can easily argue that if someone wants
to do such collection, it should be their responsibility to ensure
they aren't just collecting the same object that keeps being
overwritten, but at this point, I think it's too late to convert
everyone who might be making the "each tuple is a new tuple"
assumption.

D

On Sun, Sep 16, 2012 at 9:33 PM, Jim Donofrio <[EMAIL PROTECTED]> wrote:
> Is it ok to reuse the same Tuple and List of inputs from RecordReader across
> all getNext calls in a LoadFunc? I notice that PigStorage creates a new
> List, mProtoTuple, for every record along with a new tuple. Since PigMapBase
> just use newTupleNoCopy to copy the List, creating a new Tuple for every
> getNext seems unnecessary.
+
Jim Donofrio 2012-09-17, 05:16
+
Dmitriy Ryaboy 2012-09-17, 05:30
+
Jim Donofrio 2012-09-17, 13:15