Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Re: Question regarding DefaultTuple(size) implementation


+
Jonathan Coveney 2012-05-27, 04:13
+
Ashutosh Chauhan 2012-05-27, 07:00
+
Jonathan Coveney 2012-05-27, 08:34
Copy link to this message
-
Re: Question regarding DefaultTuple(size) implementation
Thanks Ashutosh and Jon, it makes sense now. I wouldn't try optimizing that
at the moment, no serious concerns due to it with our scripts. Good to know
the reasons behind the implementation.

-Prashant

On Sun, May 27, 2012 at 1:34 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> It's also worth noting that the speed gain is minimal. We're talking 4
> minutes per billion rows (per mapper) at most, and the speed gain only
> applies to size()-max(setIndex) i.e. if you have a tuple of size 30 and
> then set up to 15, then yeah, you wasted time nulling out 15 fields. It's
> pretty minimal, though if you want to tackle making the implementation more
> efficient and benchmark the difference, I'm all ears :)
>
> 2012/5/27 Ashutosh Chauhan <[EMAIL PROTECTED]>
>
> > I also had this question when I was playing with DefaultTuple once. I
> > removed that null-addition loop, but then it failed while setting an
> object
> > at an specific index. There is a code in Pig which sets value at a
> specific
> > index in the tuple, which will throw NPE if you dont first add null in
> the
> > list. This is one of those oddities of java.
> >
> > List<String> list = new ArrayList<String>(3);
> > list.set(1,"mystring");
> >
> > throws NPE, while following doesn't.
> >
> > List<String> list = new ArrayList<String>(3);
> > for (int i =0; i < 3; i++) {
> > list.add(null);
> > }
> > list.set(1,"mystring");
> >
> > Since client code using DefaultTuple is permissible to set value at any
> > index of tuple, we have no choice but to null-fill the list first.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Sat, May 26, 2012 at 9:13 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> >
> > > -user
> > > +dev
> > >
> > > Haha, I made this very same comment somewhere, and noticed the exact
> same
> > > thing (I think I mention it in my SchemaTuple benchmarking).
> > >
> > > The reason is so that tuple.size() will return the right value. Also,
> the
> > > expectation is that if you append, it goes to the end of all of the
> > nulls,
> > > not the first position. It's a little confusing, and yeah, it surprised
> > me
> > > too.
> > >
> > > You could definitely amortize the cost of creation over the sets that
> the
> > > user does by keeping an index, but when I first saw it I decided that
> the
> > > (slightly) increased memory footprint and the increase in code
> complexity
> > > wasn't worth a very minimal increase.
> > >
> > > 2012/5/26 Prashant Kommireddi <[EMAIL PROTECTED]>
> > >
> > > > I rambled across this while reviewing one of Jon's patches. Here is
> the
> > > > code from DefaultTuple
> > > >
> > > > /**
> > > >     * Construct a tuple with a known number of fields. Package level
> so
> > > > that callers cannot directly invoke it.
> > > >     * <br>Resulting tuple is filled pre-filled with null elements.
> Time
> > > > complexity: O(N), after allocation
> > > >     *
> > > >     * @param size
> > > >     *            Number of fields to allocate in the tuple.
> > > >     */
> > > >    DefaultTuple(int size) {
> > > >        mFields = new ArrayList<Object>(size);
> > > >        for (int i = 0; i < size; i++)
> > > >            mFields.add(null);
> > > >    }
> > > >
> > > >
> > > > Why are we walking through the list to add nulls? Wouldn't the
> initial
> > > > creation of ArrayList suffice?
> > > > mFields = new ArrayList<Object>(size) should be enough.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB