Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> PigStorage's handling of InputFormat and OutputFormat


Copy link to this message
-
Re: PigStorage's handling of InputFormat and OutputFormat
On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:

> I mean StoreFunc that delegate outputformat to PigOutputFormat.
> Though
> PigOutputFormat is not in package org.apache.pig, it is the OutputFormat of
> PigStorage,
There is no reference to PigOutputFormat in PigStorage. Did you mean
PigTextOutputFormat

Raghu.
> which many users will use as reference implementation for a
> StoreFunc.
>
> Daniel
>
> On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi <[EMAIL PROTECTED]> wrote:
>
> > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> >
> > Only drawback is extra copies required to make a Text().
> >
> >
> >
> > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <[EMAIL PROTECTED]>
> wrote:
> >
> > > I agree tuple -> text conversion better be in StoreFunc. User may have
> > > better chance to reuse OutputFormat.
> > >
> > > For backward compatibility, the signature of StoreFunc.getOutputFormat
> > > returns a generic OutputFormat object, this is fine. However, existing
> > > StoreFunc use PigOutputFormat need to change.
> >
> >
> > you mean existing classes that override PigStorage.getOutputFormat() and
> > not
> > PigStorage.putNext()?
> > Yes, they would be affected.. but fixing them is very simple, they just
> > need
> > to extend putNext().
> > As such there is no contract regd getOutputFormat() for us to break :)
> >
> > Raghu.
> >
> > > I don't know how much impact
> > > that will be, but need to be careful. We need to make clear
> announcement
> > > and
> > > document it as incompatible change if we do so.
> > >
> > > Daniel
> > >
> > > On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > expectation from PigStorage.getInputFormat()  is that it is a
> > > > InputFormat<Writable, Text>, and PigStorage handles converting Text
> to
> > > > Tuple.
> > > > This is very useful and easy for users to use some other input
> format.
> > > >
> > > > But the same is not true for PigStorage().getOutputFormat().. Here it
> > > > expects OutputFormat<Writable, Tuple>. So the output format needs to
> > > > convert
> > > > Tuple to Text().
> > > >
> > > > Not sure if this is intentional or not. I can submit a patch to move
> > > Tuple
> > > > handling into PigStorage. Then PigTextOutputFormat would be as thin
> as
> > > > PigTextInputFormat.
> > > >
> > >
> >
>