-Re: DataByteArray as Input in Load Function
Dmitriy Ryaboy 2013-09-24, 06:19
Loaders and UDFs are all initialized at the compilation phase, so you can't
pass dynamically calculated values in (you can do some things by
pre-calculating constants like current time, etc, using variable binding
via the define keyword, but you are trying to do something far more fancy).
Moreover, Loaders only take string args.
Try using jruby embedding to do this, or writing your loader so that it
takes, for example, a path to an hdfs file, and reads in the bloom filter
itself, when it gets the first next() call (don't do it in the constructor!
constructors get called a lot for runtime type checking and the objects are
then thrown away, so you don't want to do anything expensive there).
On Tue, Sep 17, 2013 at 3:42 PM, John <[EMAIL PROTECTED]> wrote:
> Or is it only possible to execute the load function at the beginning the
> script? Otherwise it should be theoretical possible to handover information
> that are created while the programm is running.
> 2013/9/17 John <[EMAIL PROTECTED]>
> > Hi,
> > Im using Pig+Hbase. I try to create a Pig programm that looks like this:
> > MY_BLOOMFILTER = load 'hbase://bloomfilterTable' using ..."
> > ... // do something to transform it to a DataByteArray
> > Now I want to load data outside of hbase based on the bloomfilter,
> > therefor I've build my own LoadFunction, but how can call my constructor
> > the pig programm. My Load constructor looks like this:
> > HBaseLoadUDF(String columnList, String optString, String rowKey,
> > DataByteArray myBloomfilter);
> > The programm should look like this, but this:
> > MY_DATA = load 'hbase://PO_S' using package.udfs.HBase LoadUDF('mycf',
> > '','oneRowKey', MY_BLOOMFILTER.$0) as(output:map);
> > but this doesn't work.
> > Is it even possible?