Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 13781: Changes to add support for streaming_python udfs.


Copy link to this message
-
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Jeremy Karn 2013-09-05, 20:27


> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 74
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line74>
> >
> >     decoding string is actually quite costly as java strings are utf-16 and creating a string always copies the content of the byte array.
> >     Can't you just use the binary format of the type?

I'm not sure how to use the binary format without significantly rewriting the serialization logic on the python side to serialize the data in a byte format (using something like struct.pack).   Is that what you were thinking of, or were you thinking of something else?

The last profiling we did showed that the bottleneck is typically on the python side deserializing the input from Pig.  I think any performance enhancements at this point should be done in a new jira with solid profiling over a number of test cases.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 98
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line98>
> >
> >     would StreamTokenizer help ?

I took a look, and I don't think its worth re-writing this logic using StreamTokenizer.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 109
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line109>
> >
> >     if you know the size in advance you can pass it here and avoid resizing

Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 123
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line123>
> >
> >     you probably want to do getInstance() once

Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingDelimiters.java, line 74
> > <https://reviews.apache.org/r/13781/diff/1/?file=344448#file344448line74>
> >
> >     why do we need a hashmap to access those statically defined values?

Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 51
> > <https://reviews.apache.org/r/13781/diff/1/?file=344449#file344449line51>
> >
> >     enum?

An enum seems a little more verbose without much benefit in this case.  I don't feel strongly about it though.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 382
> > <https://reviews.apache.org/r/13781/diff/1/?file=344449#file344449line382>
> >
> >     if (info) log.info

Fixed.  This log statement was removed entirely.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingUDFException.java, line 23
> > <https://reviews.apache.org/r/13781/diff/1/?file=344450#file344450line23>
> >
> >     if you don't make the parent see the cause printStackTrace won't display the cause's stackTrace

Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/impl/util/StorageUtil.java, line 176
> > <https://reviews.apache.org/r/13781/diff/1/?file=344460#file344460line176>
> >
> >     why is that?

No good reason, so I added BigDecimal and BigInteger support.
- Jeremy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/#review25913
-----------------------------------------------------------
On Sept. 5, 2013, 8:23 p.m., Jeremy Karn wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13781/
> -----------------------------------------------------------
>
> (Updated Sept. 5, 2013, 8:23 p.m.)
>
>
> Review request for pig.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
>
>
> Diffs
> -----