Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> PigStorage's handling of InputFormat and OutputFormat


Copy link to this message
-
Re: PigStorage's handling of InputFormat and OutputFormat
Thanks guys. Updated PIG-2187 with a new patch.

On Fri, Jul 22, 2011 at 3:44 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:

> Yes, I am talking about PigTextOutputFormat.
>
> On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi <[EMAIL PROTECTED]> wrote:
>
> > On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai <[EMAIL PROTECTED]>
> wrote:
> >
> > > I mean StoreFunc that delegate outputformat to PigOutputFormat.
> >
> >
> >
> >
> > > Though
> > > PigOutputFormat is not in package org.apache.pig, it is the
> OutputFormat
> > of
> > > PigStorage,
> >
> >
> > There is no reference to PigOutputFormat in PigStorage. Did you mean
> > PigTextOutputFormat
> >
> > Raghu.
> >
> >
> > > which many users will use as reference implementation for a
> > > StoreFunc.
> > >
> > > Daniel
> > >
> > > On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> > > >
> > > > Only drawback is extra copies required to make a Text().
> > > >
> > > >
> > > >
> > > > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > I agree tuple -> text conversion better be in StoreFunc. User may
> > have
> > > > > better chance to reuse OutputFormat.
> > > > >
> > > > > For backward compatibility, the signature of
> > StoreFunc.getOutputFormat
> > > > > returns a generic OutputFormat object, this is fine. However,
> > existing
> > > > > StoreFunc use PigOutputFormat need to change.
> > > >
> > > >
> > > > you mean existing classes that override PigStorage.getOutputFormat()
> > and
> > > > not
> > > > PigStorage.putNext()?
> > > > Yes, they would be affected.. but fixing them is very simple, they
> just
> > > > need
> > > > to extend putNext().
> > > > As such there is no contract regd getOutputFormat() for us to break
> :)
> > > >
> > > > Raghu.
> > > >
> > > > > I don't know how much impact
> > > > > that will be, but need to be careful. We need to make clear
> > > announcement
> > > > > and
> > > > > document it as incompatible change if we do so.
> > > > >
> > > > > Daniel
> > > > >
> > > > > On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > >
> > > > > > expectation from PigStorage.getInputFormat()  is that it is a
> > > > > > InputFormat<Writable, Text>, and PigStorage handles converting
> Text
> > > to
> > > > > > Tuple.
> > > > > > This is very useful and easy for users to use some other input
> > > format.
> > > > > >
> > > > > > But the same is not true for PigStorage().getOutputFormat()..
> Here
> > it
> > > > > > expects OutputFormat<Writable, Tuple>. So the output format needs
> > to
> > > > > > convert
> > > > > > Tuple to Text().
> > > > > >
> > > > > > Not sure if this is intentional or not. I can submit a patch to
> > move
> > > > > Tuple
> > > > > > handling into PigStorage. Then PigTextOutputFormat would be as
> thin
> > > as
> > > > > > PigTextInputFormat.
> > > > > >
> > > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB