Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - PigStorage


Copy link to this message
-
PigStorage
pablomar 2012-11-16, 20:48
hi all,

I'm using Pig 0.9.2 (Apache Pig version 0.9.2-cdh4.0.1, precisely)
I got a case today on which I needed to clean up some fields before
processing. I will need to do the same for all my scripts. So instead of
doing it inside the scripts, I thought in extending PigStorage and do it
inside my own Loader. My scripts will be shorter and cleaner

in fact, the only method that I needed to overwrite was :
void *readField*(byte[] buf, int start, int end)
Everything was ok and it is working. Problem was that I had to copy/paste a
lot just because private declarations
for example:
  private byte fieldDel = '\t';
  private ArrayList<Object> mProtoTuple = null;
  private TupleFactory mTupleFactory = TupleFactory.getInstance();
  private boolean mRequiredColumnsInitialized = false;

and of course:
*private *void readField(byte[] buf, int start, int end)

so I had to copy/paste:
public Tuple getNext() and all the aforementioned variables just to be able
to write my own *readField*
would it be possible in next versions of Pig to have *readField *protected
as well as *mProtoTuple *? I think it could be useful in some cases like
mine
I'm asking because I don't know the reasoning after the decisions of made
them private

thanks a lot,
+
Dmitriy Ryaboy 2012-11-16, 22:15
+
Bill Graham 2012-11-19, 17:16
+
pablomar 2012-11-19, 17:24
+
pablomar 2012-11-19, 21:17
+
Jonathan Coveney 2012-11-19, 23:32
+
pablomar 2012-11-20, 00:38