Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - PigStorage


+
pablomar 2012-11-16, 20:48
+
Dmitriy Ryaboy 2012-11-16, 22:15
+
Bill Graham 2012-11-19, 17:16
Copy link to this message
-
Re: PigStorage
pablomar 2012-11-19, 17:24
sure. My initial (and dirty) idea changed only 2 lines. I completely agree
with you
On Mon, Nov 19, 2012 at 12:16 PM, Bill Graham <[EMAIL PROTECTED]> wrote:

> +1 as well, but I'd suggest we do the following:
>
> - Keep mProtoTuple private and add protected getters/setters instead with
> javadocs describing expected usage.
> - Rename mProtoTuple and the getters/setters to something more descriptive
> than mProtoTuple.
>
>
> On Fri, Nov 16, 2012 at 2:15 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
>
> > That sounds reasonable, I've run into the same problem. Do you mind
> > submitting a patch?
> >
> > On Fri, Nov 16, 2012 at 12:48 PM, pablomar
> > <[EMAIL PROTECTED]> wrote:
> > > hi all,
> > >
> > > I'm using Pig 0.9.2 (Apache Pig version 0.9.2-cdh4.0.1, precisely)
> > > I got a case today on which I needed to clean up some fields before
> > > processing. I will need to do the same for all my scripts. So instead
> of
> > > doing it inside the scripts, I thought in extending PigStorage and do
> it
> > > inside my own Loader. My scripts will be shorter and cleaner
> > >
> > > in fact, the only method that I needed to overwrite was :
> > > void *readField*(byte[] buf, int start, int end)
> > >
> > >
> > > Everything was ok and it is working. Problem was that I had to
> > copy/paste a
> > > lot just because private declarations
> > > for example:
> > >   private byte fieldDel = '\t';
> > >   private ArrayList<Object> mProtoTuple = null;
> > >   private TupleFactory mTupleFactory = TupleFactory.getInstance();
> > >   private boolean mRequiredColumnsInitialized = false;
> > >
> > > and of course:
> > > *private *void readField(byte[] buf, int start, int end)
> > >
> > > so I had to copy/paste:
> > > public Tuple getNext() and all the aforementioned variables just to be
> > able
> > > to write my own *readField*
> > >
> > >
> > > would it be possible in next versions of Pig to have *readField
> > *protected
> > > as well as *mProtoTuple *? I think it could be useful in some cases
> like
> > > mine
> > > I'm asking because I don't know the reasoning after the decisions of
> made
> > > them private
> > >
> > > thanks a lot,
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> [EMAIL PROTECTED] going forward.*
>
+
pablomar 2012-11-19, 21:17
+
Jonathan Coveney 2012-11-19, 23:32
+
pablomar 2012-11-20, 00:38