Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Re: Question regarding DefaultTuple(size) implementation


+
Jonathan Coveney 2012-05-27, 04:13
+
Ashutosh Chauhan 2012-05-27, 07:00
+
Jonathan Coveney 2012-05-27, 08:34
Copy link to this message
-
Re: Question regarding DefaultTuple(size) implementation
Thanks Ashutosh and Jon, it makes sense now. I wouldn't try optimizing that
at the moment, no serious concerns due to it with our scripts. Good to know
the reasons behind the implementation.

-Prashant

On Sun, May 27, 2012 at 1:34 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> It's also worth noting that the speed gain is minimal. We're talking 4
> minutes per billion rows (per mapper) at most, and the speed gain only
> applies to size()-max(setIndex) i.e. if you have a tuple of size 30 and
> then set up to 15, then yeah, you wasted time nulling out 15 fields. It's
> pretty minimal, though if you want to tackle making the implementation more
> efficient and benchmark the difference, I'm all ears :)
>
> 2012/5/27 Ashutosh Chauhan <[EMAIL PROTECTED]>
>
> > I also had this question when I was playing with DefaultTuple once. I
> > removed that null-addition loop, but then it failed while setting an
> object
> > at an specific index. There is a code in Pig which sets value at a
> specific
> > index in the tuple, which will throw NPE if you dont first add null in
> the
> > list. This is one of those oddities of java.
> >
> > List<String> list = new ArrayList<String>(3);
> > list.set(1,"mystring");
> >
> > throws NPE, while following doesn't.
> >
> > List<String> list = new ArrayList<String>(3);
> > for (int i =0; i < 3; i++) {
> > list.add(null);
> > }
> > list.set(1,"mystring");
> >
> > Since client code using DefaultTuple is permissible to set value at any
> > index of tuple, we have no choice but to null-fill the list first.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Sat, May 26, 2012 at 9:13 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> >
> > > -user
> > > +dev
> > >
> > > Haha, I made this very same comment somewhere, and noticed the exact
> same
> > > thing (I think I mention it in my SchemaTuple benchmarking).
> > >
> > > The reason is so that tuple.size() will return the right value. Also,
> the
> > > expectation is that if you append, it goes to the end of all of the
> > nulls,
> > > not the first position. It's a little confusing, and yeah, it surprised
> > me
> > > too.
> > >
> > > You could definitely amortize the cost of creation over the sets that
> the
> > > user does by keeping an index, but when I first saw it I decided that
> the
> > > (slightly) increased memory footprint and the increase in code
> complexity
> > > wasn't worth a very minimal increase.
> > >
> > > 2012/5/26 Prashant Kommireddi <[EMAIL PROTECTED]>
> > >
> > > > I rambled across this while reviewing one of Jon's patches. Here is
> the
> > > > code from DefaultTuple
> > > >
> > > > /**
> > > >     * Construct a tuple with a known number of fields. Package level
> so
> > > > that callers cannot directly invoke it.
> > > >     * <br>Resulting tuple is filled pre-filled with null elements.
> Time
> > > > complexity: O(N), after allocation
> > > >     *
> > > >     * @param size
> > > >     *            Number of fields to allocate in the tuple.
> > > >     */
> > > >    DefaultTuple(int size) {
> > > >        mFields = new ArrayList<Object>(size);
> > > >        for (int i = 0; i < size; i++)
> > > >            mFields.add(null);
> > > >    }
> > > >
> > > >
> > > > Why are we walking through the list to add nulls? Wouldn't the
> initial
> > > > creation of ArrayList suffice?
> > > > mFields = new ArrayList<Object>(size) should be enough.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > >
> >
>