Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Get ResourceSchema during putNext in StoreFunc

Copy link to this message
Re: Get ResourceSchema during putNext in StoreFunc
I remember facing this problem when trying to implement a Load/Store
quite a while ago.

The issue (not really an issue I guess) is that checkSchema is a
front-end method. One that is used, perhaps multiple times, in the
Pig's front-end code. It isn't called by the back-end code of Pig that
runs on a given platform (Local or Hadoop).

To persist your schema, ensure you put it onto the 'JobConf' (in loose
terms). Pig lets you do this by using the UDFContext class for UDFs.
Get a UDFContext for your UDF, then set a property in it with a key
signifying your schema/other data and the value. Similarly, retrieve
it in the other methods using a similar way, wherever you need it
(getOutputFormat, putNext, etc.).

On Tue, Feb 1, 2011 at 10:16 AM, Jacob Perkins
> Trying to write a simple storefunc that makes use of the input data's
> field names. Is there a way to gain access to this inside of the call to
> putNext? Ostensibly you could set a variable with the schema during the
> call to checkSchema, eg. in HBaseStorage, but as far as I can tell this
> is null by the time putNext is called. Is there some other way or am I
> missing something obvious?
> --jacob
> @thedatachef

Harsh J