Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Issue with LoadFunc & Slicer


+
Vincent BARAT 2009-09-14, 13:38
+
Vincent BARAT 2009-09-14, 14:08
+
Dmitriy Ryaboy 2009-09-14, 15:35
+
Kevin Weil 2009-09-14, 15:58
+
Alan Gates 2009-09-17, 20:07
Copy link to this message
-
Re: Issue with LoadFunc & Slicer
Hummm... Or the captcha is buggy, or I'm getting blind: I cannot
manage to signup to your JIRA !

Dmitriy Ryaboy a �crit :
> There's a ticket for that: https://issues.apache.org/jira/browse/PIG-612
>
> Vote it up so that the pig developers have a record of user interest
> in this feature.
>
> -D
>
> On Mon, Sep 14, 2009 at 10:08 AM, Vincent BARAT
> <[EMAIL PROTECTED]> wrote:
>> It seems that I got my answer: custom loader functions can only be used in
>> map reduce mode, not local mode: in local mode, the file specified must be a
>> real file.
>>
>> Vincent BARAT a �crit :
>>> Hello,
>>>
>>> In the process of to trying to add the support for HBase 0.20.0 in PIG
>>> (trunk) I was trying the tutorial from PIG documentation:
>>>
>>> http://hadoop.apache.org/pig/docs/r0.3.0/udf.html#Custom+Slicer
>>>
>>> Unfortunately, when I try:
>>>
>>> A = LOAD '27' USING RangeSlicer();
>>> dump A;
>>>
>>> PIG reports the following error:
>>>
>>> 2009-09-14 15:33:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 2081: Unable to setup the load function.
>>>
>>> If I provide an existing file, instead of '27', I no longer have this
>>> error, but the output of the dump function is empty.
>>>
>>> Any idea ?
>>>
>>>
>>> Here is my RangeSlicer() code:
>>>
>>> ========================================================>>>
>>>
>>> package com.ubikod.ermin.backend.pigudfs;
>>>
>>> import java.io.IOException;
>>>
>>> import org.apache.commons.logging.Log;
>>> import org.apache.commons.logging.LogFactory;
>>> import org.apache.pig.ExecType;
>>> import org.apache.pig.LoadFunc;
>>> import org.apache.pig.Slice;
>>> import org.apache.pig.Slicer;
>>> import org.apache.pig.backend.datastorage.DataStorage;
>>> import org.apache.pig.builtin.Utf8StorageConverter;
>>> import org.apache.pig.data.Tuple;
>>> import org.apache.pig.impl.io.BufferedPositionedInputStream;
>>> import org.apache.pig.impl.logicalLayer.schema.Schema;
>>>
>>> public class RangeSlicer extends Utf8StorageConverter implements Slicer,
>>>  LoadFunc
>>> {
>>>  private static final Log LOG = LogFactory.getLog(RangeSlicer.class);
>>>
>>>  public RangeSlicer()
>>>  {
>>>    LOG.info("RangeSlicer");
>>>  }
>>>
>>>  /**
>>>   * Expects location to be a Stringified integer, and makes
>>>   * Integer.parseInt(location) slices. Each slice generates a single
>>> value, its
>>>   * index in the sequence of slices.
>>>   */
>>>  public Slice[] slice(DataStorage store, String location) throws
>>> IOException
>>>  {
>>>    LOG.info("slice #################" + location);
>>>    location = "30";
>>>    // Note: validate has already made sure that location is an integer
>>>    int numslices = Integer.parseInt(location);
>>>    LOG.info("slice #################" + numslices);
>>>    Slice[] slices = new Slice[numslices];
>>>    for (int i = 0; i < slices.length; i++)
>>>    {
>>>      slices[i] = new SingleValueSlice(i);
>>>    }
>>>    return slices;
>>>  }
>>>
>>>  public void validate(DataStorage store, String location) throws
>>> IOException
>>>  {
>>>    try
>>>    {
>>>      LOG.info("validate #################" + location);
>>>      Integer.parseInt("30");
>>>      LOG.info("validate #################" + location);
>>>    }
>>>    catch (NumberFormatException nfe)
>>>    {
>>>      throw new IOException(nfe.getMessage());
>>>    }
>>>  }
>>>
>>>  /**
>>>   * A Slice that returns a single value from next.
>>>   */
>>>  public static class SingleValueSlice implements Slice
>>>  {
>>>    // note this value is set by the Slicer and will get serialized and
>>>    // deserialized at the remote processing node
>>>    public int val;
>>>    // since we just have a single value, we can use a boolean rather than
>>> a
>>>    // counter
>>>    private transient boolean read;
>>>
>>>    public SingleValueSlice(int value)
>>>    {
>>>      LOG.info("SingleValueSlice #################" + value);
>>>
>>>      this.val = value;
>>>    }
>>>
>>>    public void close() throws IOException
>>>    {
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB