Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Properties Configuration in Custom Load Function

Copy link to this message
Re: Properties Configuration in Custom Load Function
Hi Markus,

That's correct -- the loaders will get instantiated both on the
client-side (to allow you to do any setup you need to do), and on the
MR side (to actually do the loading). You can do a couple of things to
get your properties over to the MR side:

1) add your file to the "tmpfiles" property of the jobconf that gets
passed in via setLocation. This may be error-prone since you might be
in a situation where two of your loaders, with different properties,
are processed in the same MR job (for a join, for example).

2) Serialize your properties straight into the udf context, namespaced
using the signature you get via setUDFContextSignature, and
deserialize them on the backend.

On Mon, Apr 23, 2012 at 7:09 AM, Markus Resch <[EMAIL PROTECTED]> wrote:
> Hey Folks,
> We've created our own LOAD function by extending the default AVRO
> Storage (basicly we're processing a set of paths to glob by the Avro
> Storage)
> Our algorithm needs some basic configuration which we're reading out of
> a .properties file which is located right beside the pig script.
> Our algorithm works great. According to the output directly after
> starting the pig script everything is just fine. But after the jobs runs
> for a while we're getting an error message which says it can't find the
> properties file. We're assuming that the load gets started on each data
> node and we don't have that config there. Is that assumption true? And
> if: Is there a way to work around this issue?
> Thanks
> Markus