Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request 13781: Changes to add support for streaming_python udfs.


+
Jeremy Karn 2013-08-23, 20:19
+
Julien Le Dem 2013-09-04, 23:36
+
Jeremy Karn 2013-09-05, 20:27
+
Jeremy Karn 2013-09-05, 20:23
+
Daniel Dai 2013-09-07, 00:04
Copy link to this message
-
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Jeremy Karn 2013-09-07, 15:38


> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 47
> > <https://reviews.apache.org/r/13781/diff/2/?file=348274#file348274line47>
> >
> >     This might be affected by PIG-3255

Looks like it would be easy to make this patch work with PIG-3255.  If we could get that merged shortly I'd update this patch with the necessary changes.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 109
> > <https://reviews.apache.org/r/13781/diff/2/?file=348274#file348274line109>
> >
> >     How do we handle delimit escaping? Seems we don't handle it right now?

We don't handle it now.  We use a 3 character delimiter which avoids collisions in practice.  I would actually like to switch the serialization to a 'real' serialization library but we did a little bit of prototyping a few months ago and there wasn't anything out there that worked really well and had significant performance improvements.  Avro was the most promising but on the Python side it was missing a number of features we needed.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 169
> > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line169>
> >
> >     Are those files only for illustrate? Why not just use a random tmp file? Why do we distinguish exectype in illustrate? It only run locally.

I renamed the class to remove Illustrate from the name.  This is actually used both for illustrate (where we capture the output only from the last run) and in real runs (where we write the output to the user logs).  By putting it in the user logs you can get the logs from the task tracker web interface.  We could use temp files but I like the more human readable names.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 198
> > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line198>
> >
> >     Any reason not to create a StreamingUDFInputHandler?

I didn't really think it was necessary but I guess its more consistent.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 366
> > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line366>
> >
> >     Seems flush() is more proper to put inside inputHandler

I'm not sure if it would be unnecessary for some clients of InputHandler.  For the StreamingUDF we need to make sure we always flush or else we can get blocked on the reading process not getting the full input (and then never writing the output) but I'm not sure if its always necessary.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/impl/streaming/InputHandler.java, line 66
> > <https://reviews.apache.org/r/13781/diff/2/?file=348282#file348282line66>
> >
> >     Why we need this change?

Probably just left over from debugging where I wanted to know the value of the byte[].
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > test/unit-tests, line 73
> > <https://reviews.apache.org/r/13781/diff/2/?file=348302#file348302line73>
> >
> >     This is not true unit test, should remove from the list

Can it go in commit-tests?
- Jeremy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/#review25964
-----------------------------------------------------------
On Sept. 5, 2013, 8:23 p.m., Jeremy Karn wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13781/
> -----------------------------------------------------------
>
> (Updated Sept. 5, 2013, 8:23 p.m.)
>
>
> Review request for pig.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
>
>
+
Daniel Dai 2013-09-16, 04:18
+
Jeremy Karn 2013-09-07, 15:39