Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Re: Question regarding DefaultTuple(size) implementation


+
Jonathan Coveney 2012-05-27, 04:13
Copy link to this message
-
Re: Question regarding DefaultTuple(size) implementation
I also had this question when I was playing with DefaultTuple once. I
removed that null-addition loop, but then it failed while setting an object
at an specific index. There is a code in Pig which sets value at a specific
index in the tuple, which will throw NPE if you dont first add null in the
list. This is one of those oddities of java.

List<String> list = new ArrayList<String>(3);
list.set(1,"mystring");

throws NPE, while following doesn't.

List<String> list = new ArrayList<String>(3);
for (int i =0; i < 3; i++) {
list.add(null);
}
list.set(1,"mystring");

Since client code using DefaultTuple is permissible to set value at any
index of tuple, we have no choice but to null-fill the list first.

Hope it helps,
Ashutosh

On Sat, May 26, 2012 at 9:13 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> -user
> +dev
>
> Haha, I made this very same comment somewhere, and noticed the exact same
> thing (I think I mention it in my SchemaTuple benchmarking).
>
> The reason is so that tuple.size() will return the right value. Also, the
> expectation is that if you append, it goes to the end of all of the nulls,
> not the first position. It's a little confusing, and yeah, it surprised me
> too.
>
> You could definitely amortize the cost of creation over the sets that the
> user does by keeping an index, but when I first saw it I decided that the
> (slightly) increased memory footprint and the increase in code complexity
> wasn't worth a very minimal increase.
>
> 2012/5/26 Prashant Kommireddi <[EMAIL PROTECTED]>
>
> > I rambled across this while reviewing one of Jon's patches. Here is the
> > code from DefaultTuple
> >
> > /**
> >     * Construct a tuple with a known number of fields. Package level so
> > that callers cannot directly invoke it.
> >     * <br>Resulting tuple is filled pre-filled with null elements. Time
> > complexity: O(N), after allocation
> >     *
> >     * @param size
> >     *            Number of fields to allocate in the tuple.
> >     */
> >    DefaultTuple(int size) {
> >        mFields = new ArrayList<Object>(size);
> >        for (int i = 0; i < size; i++)
> >            mFields.add(null);
> >    }
> >
> >
> > Why are we walking through the list to add nulls? Wouldn't the initial
> > creation of ArrayList suffice?
> > mFields = new ArrayList<Object>(size) should be enough.
> >
> > Thanks,
> > Prashant
> >
>
+
Jonathan Coveney 2012-05-27, 08:34
+
Prashant Kommireddi 2012-05-27, 09:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB