Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Properties Configuration in Custom Load Function


Copy link to this message
-
Re: Properties Configuration in Custom Load Function
Hi Markus,

That's correct -- the loaders will get instantiated both on the
client-side (to allow you to do any setup you need to do), and on the
MR side (to actually do the loading). You can do a couple of things to
get your properties over to the MR side:

1) add your file to the "tmpfiles" property of the jobconf that gets
passed in via setLocation. This may be error-prone since you might be
in a situation where two of your loaders, with different properties,
are processed in the same MR job (for a join, for example).

2) Serialize your properties straight into the udf context, namespaced
using the signature you get via setUDFContextSignature, and
deserialize them on the backend.

D
On Mon, Apr 23, 2012 at 7:09 AM, Markus Resch <[EMAIL PROTECTED]> wrote:
> Hey Folks,
>
> We've created our own LOAD function by extending the default AVRO
> Storage (basicly we're processing a set of paths to glob by the Avro
> Storage)
>
> Our algorithm needs some basic configuration which we're reading out of
> a .properties file which is located right beside the pig script.
> Our algorithm works great. According to the output directly after
> starting the pig script everything is just fine. But after the jobs runs
> for a while we're getting an error message which says it can't find the
> properties file. We're assuming that the load gets started on each data
> node and we don't have that config there. Is that assumption true? And
> if: Is there a way to work around this issue?
>
> Thanks
>
> Markus
>