Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Re: Question regarding DefaultTuple(size) implementation


Copy link to this message
-
Re: Question regarding DefaultTuple(size) implementation
Ashutosh Chauhan 2012-05-27, 07:00
I also had this question when I was playing with DefaultTuple once. I
removed that null-addition loop, but then it failed while setting an object
at an specific index. There is a code in Pig which sets value at a specific
index in the tuple, which will throw NPE if you dont first add null in the
list. This is one of those oddities of java.

List<String> list = new ArrayList<String>(3);
list.set(1,"mystring");

throws NPE, while following doesn't.

List<String> list = new ArrayList<String>(3);
for (int i =0; i < 3; i++) {
list.add(null);
}
list.set(1,"mystring");

Since client code using DefaultTuple is permissible to set value at any
index of tuple, we have no choice but to null-fill the list first.

Hope it helps,
Ashutosh

On Sat, May 26, 2012 at 9:13 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> -user
> +dev
>
> Haha, I made this very same comment somewhere, and noticed the exact same
> thing (I think I mention it in my SchemaTuple benchmarking).
>
> The reason is so that tuple.size() will return the right value. Also, the
> expectation is that if you append, it goes to the end of all of the nulls,
> not the first position. It's a little confusing, and yeah, it surprised me
> too.
>
> You could definitely amortize the cost of creation over the sets that the
> user does by keeping an index, but when I first saw it I decided that the
> (slightly) increased memory footprint and the increase in code complexity
> wasn't worth a very minimal increase.
>
> 2012/5/26 Prashant Kommireddi <[EMAIL PROTECTED]>
>
> > I rambled across this while reviewing one of Jon's patches. Here is the
> > code from DefaultTuple
> >
> > /**
> >     * Construct a tuple with a known number of fields. Package level so
> > that callers cannot directly invoke it.
> >     * <br>Resulting tuple is filled pre-filled with null elements. Time
> > complexity: O(N), after allocation
> >     *
> >     * @param size
> >     *            Number of fields to allocate in the tuple.
> >     */
> >    DefaultTuple(int size) {
> >        mFields = new ArrayList<Object>(size);
> >        for (int i = 0; i < size; i++)
> >            mFields.add(null);
> >    }
> >
> >
> > Why are we walking through the list to add nulls? Wouldn't the initial
> > creation of ArrayList suffice?
> > mFields = new ArrayList<Object>(size) should be enough.
> >
> > Thanks,
> > Prashant
> >
>