Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Issue with LoadFunc & Slicer


+
Vincent BARAT 2009-09-14, 13:38
+
Vincent BARAT 2009-09-14, 14:08
+
Dmitriy Ryaboy 2009-09-14, 15:35
+
Kevin Weil 2009-09-14, 15:58
+
Alan Gates 2009-09-17, 20:07
Copy link to this message
-
Re: Issue with LoadFunc & Slicer
Vincent BARAT 2009-09-14, 16:33
Hummm... Or the captcha is buggy, or I'm getting blind: I cannot
manage to signup to your JIRA !

Dmitriy Ryaboy a �crit :
> There's a ticket for that: https://issues.apache.org/jira/browse/PIG-612
>
> Vote it up so that the pig developers have a record of user interest
> in this feature.
>
> -D
>
> On Mon, Sep 14, 2009 at 10:08 AM, Vincent BARAT
> <[EMAIL PROTECTED]> wrote:
>> It seems that I got my answer: custom loader functions can only be used in
>> map reduce mode, not local mode: in local mode, the file specified must be a
>> real file.
>>
>> Vincent BARAT a �crit :
>>> Hello,
>>>
>>> In the process of to trying to add the support for HBase 0.20.0 in PIG
>>> (trunk) I was trying the tutorial from PIG documentation:
>>>
>>> http://hadoop.apache.org/pig/docs/r0.3.0/udf.html#Custom+Slicer
>>>
>>> Unfortunately, when I try:
>>>
>>> A = LOAD '27' USING RangeSlicer();
>>> dump A;
>>>
>>> PIG reports the following error:
>>>
>>> 2009-09-14 15:33:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 2081: Unable to setup the load function.
>>>
>>> If I provide an existing file, instead of '27', I no longer have this
>>> error, but the output of the dump function is empty.
>>>
>>> Any idea ?
>>>
>>>
>>> Here is my RangeSlicer() code:
>>>
>>> ========================================================>>>
>>>
>>> package com.ubikod.ermin.backend.pigudfs;
>>>
>>> import java.io.IOException;
>>>
>>> import org.apache.commons.logging.Log;
>>> import org.apache.commons.logging.LogFactory;
>>> import org.apache.pig.ExecType;
>>> import org.apache.pig.LoadFunc;
>>> import org.apache.pig.Slice;
>>> import org.apache.pig.Slicer;
>>> import org.apache.pig.backend.datastorage.DataStorage;
>>> import org.apache.pig.builtin.Utf8StorageConverter;
>>> import org.apache.pig.data.Tuple;
>>> import org.apache.pig.impl.io.BufferedPositionedInputStream;
>>> import org.apache.pig.impl.logicalLayer.schema.Schema;
>>>
>>> public class RangeSlicer extends Utf8StorageConverter implements Slicer,
>>>  LoadFunc
>>> {
>>>  private static final Log LOG = LogFactory.getLog(RangeSlicer.class);
>>>
>>>  public RangeSlicer()
>>>  {
>>>    LOG.info("RangeSlicer");
>>>  }
>>>
>>>  /**
>>>   * Expects location to be a Stringified integer, and makes
>>>   * Integer.parseInt(location) slices. Each slice generates a single
>>> value, its
>>>   * index in the sequence of slices.
>>>   */
>>>  public Slice[] slice(DataStorage store, String location) throws
>>> IOException
>>>  {
>>>    LOG.info("slice #################" + location);
>>>    location = "30";
>>>    // Note: validate has already made sure that location is an integer
>>>    int numslices = Integer.parseInt(location);
>>>    LOG.info("slice #################" + numslices);
>>>    Slice[] slices = new Slice[numslices];
>>>    for (int i = 0; i < slices.length; i++)
>>>    {
>>>      slices[i] = new SingleValueSlice(i);
>>>    }
>>>    return slices;
>>>  }
>>>
>>>  public void validate(DataStorage store, String location) throws
>>> IOException
>>>  {
>>>    try
>>>    {
>>>      LOG.info("validate #################" + location);
>>>      Integer.parseInt("30");
>>>      LOG.info("validate #################" + location);
>>>    }
>>>    catch (NumberFormatException nfe)
>>>    {
>>>      throw new IOException(nfe.getMessage());
>>>    }
>>>  }
>>>
>>>  /**
>>>   * A Slice that returns a single value from next.
>>>   */
>>>  public static class SingleValueSlice implements Slice
>>>  {
>>>    // note this value is set by the Slicer and will get serialized and
>>>    // deserialized at the remote processing node
>>>    public int val;
>>>    // since we just have a single value, we can use a boolean rather than
>>> a
>>>    // counter
>>>    private transient boolean read;
>>>
>>>    public SingleValueSlice(int value)
>>>    {
>>>      LOG.info("SingleValueSlice #################" + value);
>>>
>>>      this.val = value;
>>>    }
>>>
>>>    public void close() throws IOException
>>>    {
>>