Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Schema access while writing

Copy link to this message
Re: Schema access while writing
Ah. Thanks Scott, I think I should've looked at PigStorage much more
closely earlier. It does seem to cover everything.

Stored using the signature and UDFContext, and retrieved using the
same where I wanted to. Thank you :)

On Mon, Jun 28, 2010 at 8:24 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
> UdfContext is what you need to push data to the back end.  It would be nice if something internal to pig like the schema could be made available without custom sterilization though.
> ----- Reply message -----
> From: "Harsh J" <[EMAIL PROTECTED]>
> Date: Mon, Jun 28, 2010 3:08 am
> Subject: Schema access while writing
> Hello,
> I'm implementing a custom Store UDF using StoreFuncInterface.
> I need access to the ResourceSchema object each time I do a putNext
> operation, but am unable to do this since checkSchema() [which carries
> what I require] is only called once and that's during job init or so.
> If I try to store a reference/copy of that object, it does not work
> since the mapper-side instances of my UDF don't get the checkSchema()
> call.
> What I've tried is to process and store required parts of the
> ResourceSchema into the Job's Configuration using the fact that
> setStoreLocation(String, Job) is called in the init AFTER
> checkSchema(), and tried retrieving that to no avail. It looks like
> the changes I make to the Job object given to me goes futile, as I get
> a null at the map/reduce side for the configuration name I've stored
> it as.
> What do I do to access either the ResourceSchema or even a
> Job-Configuration variable that I wish to set post processing the
> ResourceSchema?
> --
> Harsh J
> www.harshj.com<http://www.harshj.com>

Harsh J