Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Get ResourceSchema during putNext in StoreFunc


+
Jacob Perkins 2011-02-01, 04:46
+
Dan Harvey 2011-02-01, 15:23
+
Jacob Perkins 2011-02-01, 15:45
Copy link to this message
-
Re: Get ResourceSchema during putNext in StoreFunc
I remember facing this problem when trying to implement a Load/Store
quite a while ago.

The issue (not really an issue I guess) is that checkSchema is a
front-end method. One that is used, perhaps multiple times, in the
Pig's front-end code. It isn't called by the back-end code of Pig that
runs on a given platform (Local or Hadoop).

To persist your schema, ensure you put it onto the 'JobConf' (in loose
terms). Pig lets you do this by using the UDFContext class for UDFs.
Get a UDFContext for your UDF, then set a property in it with a key
signifying your schema/other data and the value. Similarly, retrieve
it in the other methods using a similar way, wherever you need it
(getOutputFormat, putNext, etc.).

On Tue, Feb 1, 2011 at 10:16 AM, Jacob Perkins
<[EMAIL PROTECTED]> wrote:
> Trying to write a simple storefunc that makes use of the input data's
> field names. Is there a way to gain access to this inside of the call to
> putNext? Ostensibly you could set a variable with the schema during the
> call to checkSchema, eg. in HBaseStorage, but as far as I can tell this
> is null by the time putNext is called. Is there some other way or am I
> missing something obvious?
>
> --jacob
> @thedatachef
>
>

--
Harsh J
www.harshj.com
+
jacob 2011-02-01, 16:28
+
Dan Harvey 2011-02-01, 16:42