Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF with parameterized constructor in DEFINE statement

Copy link to this message
RE: UDF with parameterized constructor in DEFINE statement
The error message is misleading. The user expected 'day' to be the alias used for the UDF and not an alias in the schema.

-----Original Message-----
From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 01, 2011 6:22 AM
Subject: Re: UDF with parameterized constructor in DEFINE statement

Ther error, at least following what you posted, is different from what you think. The problem is simply that the column "day" doesn't exist. You can see in the output that the columns are {ex_time:
chararray,scBytes: long,fSize: long}. If you want it to be called day, you can name it as such with an "as day" or you can channge the schema or you can just group by extime. In generral if you get a parsing error that comes before errors with the udf itself, as it will try and parse the whole thing THEN make the job

Sent via BlackBerry

-----Original Message-----
From: Charles Gonçalves <[EMAIL PROTECTED]>
Date: Tue, 1 Feb 2011 12:12:30
Subject: UDF with parameterized constructor in DEFINE statement

Hi Guys,

I'm Have an UDF in which I want to pass a long in a timestamp representation and get an Date formated with the SimpleDateFormat Class.
I will pass to the UDF constructor  the string format to the sdf object, and eventualy the timezone if needed.

So I made a class to do that but when I use it on my script I got the error:

ERROR 1000: Error during parsing. Invalid alias: day in {ex_time:
chararray,scBytes: long,fSize: long}
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
alias: day in {ex_time: chararray,scBytes: long,fSize: long}..

What is the best way to parameterize a java UDF ?
What I'm doing wrong?


THE script:

REGISTER MscPigUtils.jar
DEFINE EdgeLoader msc.pig.EdgeLoader();
DEFINE day msc.pig.ExtractTime('dd');
raw = LOAD
using EdgeLoader;
B = FOREACH raw GENERATE day(ts), scBytes, fSize ; C = GROUP B BY day; clients_stats = FOREACH C { complete_views = FILTER B BY scBytes >= fSize;  GENERATE FLATTEN(group), COUNT(B), COUNT(complete_views), SUM(B.scBytes); } STORE clients_stats into 'dateTransferday';

The Class:

package msc.pig;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;

import msc.misc.TimeUtils;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.log4j.Logger;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;

public class ExtractTime extends EvalFunc<String> {  private static final Logger logger = Logger.getLogger(ExtractTime.class);
 private static DateFormat utc_df;
 private static Calendar utc_cal;
  public ExtractTime(String format) {
 utc_df =  new SimpleDateFormat(format); utc_df.setTimeZone(TimeZone.getTimeZone("UTC"));
 utc_cal = Calendar.getInstance();
 public ExtractTime(String format,String tz) {  utc_df =  new SimpleDateFormat(format); utc_df.setTimeZone(TimeZone.getTimeZone(tz));
 utc_cal = Calendar.getInstance();

 public String exec(Tuple input) throws IOException { if (input == null || input.size() == 0) {  return null; }  try { Object object = input.get(0);  if (object == null) { return null;  } Long ts = ((Long) object);  utc_cal.setTimeInMillis(ts * 1000);  return utc_df.format(utc_cal.getTime());  }catch (Exception e) { logger.error("Error Parsing date !!",e);  return null; }  } @Override  public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema("ex_time", DataType.CHARARRAY));  } }
*Charles Ferreira Gonçalves *
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840