Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF property passing


Copy link to this message
-
Re: UDF property passing
On Fri, Jul 8, 2011 at 2:48 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

> But are you keeping member variables or do you put everything in the
> context?
Anything that you want to remember needs to put in the context.
PIG makes sure that the constructor is called with the same arguments on
front-end and backend. In addition, for loaders and storage, setContext API
is invoked with the same same context on frontend and backend.

Anything else you need to put in the context. If something is derived from
constructor arguments, you don't need to put into context for e.g.

not sure if I understood the question correctly, but PIG does not transfer
your object, so what you store in the member variables does not matter.

Raghu.

On Jul 8, 2011, at 3:21 PM, Raghu Angadi wrote:
>
> > yes. that is exactly how HBaseStorage uses context.
> >
> > On Fri, Jul 8, 2011 at 10:19 AM, Jeremy Hanna <
> [EMAIL PROTECTED]>wrote:
> >
> >> In CassandraStorage, we had been using some load/store URL specific
> >> information (keyspace, column family names) to make the
> >> UDFContext.properties key unique, but with what Grant said was in the
> docs,
> >> we just wrote a patch to instead use the udf context signatures for
> those
> >> keys when setting and getting those property values.  Is that the way to
> go
> >> then?  I'm setting those as member variables and then using them later.
> >>
> >>   @Override
> >>   public void setUDFContextSignature(String signature)
> >>   {
> >>       this.loadSignature = signature;
> >>   }
> >>
> >>   /* StoreFunc methods */
> >>   public void setStoreFuncUDFContextSignature(String signature)
> >>   {
> >>       this.storeSignature = signature;
> >>    }
> >>
> >>
> >> On Jul 8, 2011, at 7:24 AM, Grant Ingersoll wrote:
> >>
> >>> What is the guidance here on using member variables when implementing
> >> UDFs and passing properties?  That is, what are the semantics for using
> them
> >> to store properties for a UDF instance?  The docs talk a lot about
> making
> >> sure that no side effects happen from multiple calls to a UDF instance,
> but
> >> it is not clear whether that means it's doing things like changing the
> >> Location for a given instance of a UDF or just calling it multiple
> times.
> >> PigStorage suggests not (since it keeps a member var location), but the
> >> UDFContext docs suggests that one keep all state in the UDFContext under
> an
> >> appropriate signature.
> >>>
> >>> See also https://issues.apache.org/jira/browse/CASSANDRA-2869 for
> >> another case where this has reared it's head in an improper
> implementation.
> >>>
> >>> -Grant
> >>>
> >>> On Jul 7, 2011, at 3:24 AM, Jeremy Hanna wrote:
> >>>
> >>>>
> >>>> On Jul 6, 2011, at 11:10 PM, Raghu Angadi wrote:
> >>>>
> >>>>> On Wed, Jul 6, 2011 at 7:20 PM, Jeremy Hanna <
> >> [EMAIL PROTECTED]>wrote:
> >>>>>
> >>>>>>
> >>>>>> On Jul 6, 2011, at 12:47 PM, Dmitriy Ryaboy wrote:
> >>>>>>
> >>>>>>> I think this is the same problem we were having earlier:
> >>>>>>> http://hadoop.markmail.org/thread/kgxhdgw6zdmadch4
> >>>>>>>
> >>>>>>> One workaround is to use defines to explicitly create different
> >>>>>>> instances of your UDF, and use them separately.. it's ugly but it
> >>>>>>> works.
> >>>>>>
> >>>>>> Thanks Dmitriy.
> >>>>>>
> >>>>>> I tried doing something like:
> >>>>>> define ToCassandraBag1 org.pygmalion.udf.ToCassandraBag();
> >>>>>> define ToCassandraBag2 org.pygmalion.udf.ToCassandraBag();
> >>>>>>
> >>>>>
> >>>>> This still does not work since you can't distinguish the two. The way
> I
> >> was
> >>>>> thinking of doing this is to let user pass in some unique sting as a
> >>>>> substitute for context:
> >>>>>
> >>>>> define ToCassandraBag1 ToCassandraBag('1');
> >>>>> define ToCassandraBag2 ToCassandraBag('2');
> >>>>
> >>>> Ah yes.  I had misunderstood.  Thanks for the clarification.  Now the
> >> pig docs also make more sense in the Passing Configurations to UDFs
> section:
> >>>>
> >>
> http://pig.apache.org/docs/r0.8.1/udf.html#Passing+Configurations+to+UDFs