Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - PigStorage's handling of InputFormat and OutputFormat


Copy link to this message
-
Re: PigStorage's handling of InputFormat and OutputFormat
Raghu Angadi 2011-07-22, 23:47
Thanks guys. Updated PIG-2187 with a new patch.

On Fri, Jul 22, 2011 at 3:44 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:

> Yes, I am talking about PigTextOutputFormat.
>
> On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi <[EMAIL PROTECTED]> wrote:
>
> > On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai <[EMAIL PROTECTED]>
> wrote:
> >
> > > I mean StoreFunc that delegate outputformat to PigOutputFormat.
> >
> >
> >
> >
> > > Though
> > > PigOutputFormat is not in package org.apache.pig, it is the
> OutputFormat
> > of
> > > PigStorage,
> >
> >
> > There is no reference to PigOutputFormat in PigStorage. Did you mean
> > PigTextOutputFormat
> >
> > Raghu.
> >
> >
> > > which many users will use as reference implementation for a
> > > StoreFunc.
> > >
> > > Daniel
> > >
> > > On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> > > >
> > > > Only drawback is extra copies required to make a Text().
> > > >
> > > >
> > > >
> > > > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > I agree tuple -> text conversion better be in StoreFunc. User may
> > have
> > > > > better chance to reuse OutputFormat.
> > > > >
> > > > > For backward compatibility, the signature of
> > StoreFunc.getOutputFormat
> > > > > returns a generic OutputFormat object, this is fine. However,
> > existing
> > > > > StoreFunc use PigOutputFormat need to change.
> > > >
> > > >
> > > > you mean existing classes that override PigStorage.getOutputFormat()
> > and
> > > > not
> > > > PigStorage.putNext()?
> > > > Yes, they would be affected.. but fixing them is very simple, they
> just
> > > > need
> > > > to extend putNext().
> > > > As such there is no contract regd getOutputFormat() for us to break
> :)
> > > >
> > > > Raghu.
> > > >
> > > > > I don't know how much impact
> > > > > that will be, but need to be careful. We need to make clear
> > > announcement
> > > > > and
> > > > > document it as incompatible change if we do so.
> > > > >
> > > > > Daniel
> > > > >
> > > > > On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > >
> > > > > > expectation from PigStorage.getInputFormat()  is that it is a
> > > > > > InputFormat<Writable, Text>, and PigStorage handles converting
> Text
> > > to
> > > > > > Tuple.
> > > > > > This is very useful and easy for users to use some other input
> > > format.
> > > > > >
> > > > > > But the same is not true for PigStorage().getOutputFormat()..
> Here
> > it
> > > > > > expects OutputFormat<Writable, Tuple>. So the output format needs
> > to
> > > > > > convert
> > > > > > Tuple to Text().
> > > > > >
> > > > > > Not sure if this is intentional or not. I can submit a patch to
> > move
> > > > > Tuple
> > > > > > handling into PigStorage. Then PigTextOutputFormat would be as
> thin
> > > as
> > > > > > PigTextInputFormat.
> > > > > >
> > > > >
> > > >
> > >
> >
>