Yes, I don't like the extra copies either.. thats why didn't mark the Jira
'patch available'. A static helper method would also be useful.
But I don't see how it breaks how it breaks existing StoreFuncs or output
formats.. is there an example? There are very few StoreFuncs that extend
On Fri, Jul 22, 2011 at 1:37 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> At this point I'm -1 on this. I don't want to break existing output
> formats or store functions. And I don't see that much value here. You can
> accomplish the same thing by putting the logic in a static method of
> PigTextOutputFormat and letting other users use it. Also, the cost of an
> extra copy of the output is bad. We don't want to slow down storing data.
> On Jul 22, 2011, at 12:24 PM, Raghu Angadi wrote:
> > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> > Only drawback is extra copies required to make a Text().
> > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <[EMAIL PROTECTED]>
> >> I agree tuple -> text conversion better be in StoreFunc. User may have
> >> better chance to reuse OutputFormat.
> >> For backward compatibility, the signature of StoreFunc.getOutputFormat
> >> returns a generic OutputFormat object, this is fine. However, existing
> >> StoreFunc use PigOutputFormat need to change.
> > you mean existing classes that override PigStorage.getOutputFormat() and
> > PigStorage.putNext()?
> > Yes, they would be affected.. but fixing them is very simple, they just
> > to extend putNext().
> > As such there is no contract regd getOutputFormat() for us to break :)
> > Raghu.
> >> I don't know how much impact
> >> that will be, but need to be careful. We need to make clear announcement
> >> and
> >> document it as incompatible change if we do so.
> >> Daniel
> >> On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <[EMAIL PROTECTED]>
> >>> expectation from PigStorage.getInputFormat() is that it is a
> >>> InputFormat<Writable, Text>, and PigStorage handles converting Text to
> >>> Tuple.
> >>> This is very useful and easy for users to use some other input format.
> >>> But the same is not true for PigStorage().getOutputFormat().. Here it
> >>> expects OutputFormat<Writable, Tuple>. So the output format needs to
> >>> convert
> >>> Tuple to Text().
> >>> Not sure if this is intentional or not. I can submit a patch to move
> >> Tuple
> >>> handling into PigStorage. Then PigTextOutputFormat would be as thin as
> >>> PigTextInputFormat.