Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> A GenericUDF Function to Extract a Field From an Array of Structs


Copy link to this message
-
RE: A GenericUDF Function to Extract a Field From an Array of Structs
Sorry, the test should be following (changed extract_shas to extract_product_category):
import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import org.testng.annotations.Test;
import java.util.ArrayList;import java.util.List;
public class TestGenericUDFExtractProductCategory{    ArrayList<String> fieldNames = new ArrayList<String>();    ArrayList<ObjectInspector> fieldObjectInspectors = new ArrayList<ObjectInspector>();
    @Test    public void simpleTest()        throws Exception    {        ListObjectInspector firstInspector = new MyListObjectInspector();
        ArrayList test = new ArrayList();        test.add("test");
        ArrayList test2 = new ArrayList();        test2.add(test);
        StructObjectInspector soi = ObjectInspectorFactory.getStandardStructObjectInspector(test, test2);
        fieldNames.add("productCategory");        fieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
        GenericUDF.DeferredObject firstDeferredObject = new MyDeferredObject(test2);
        GenericUDF extract_product_category = new GenericUDFExtractProductCategory();
        extract_product_category.initialize(new ObjectInspector[]{firstInspector});
        extract_product_category.evaluate(new DeferredObject[]{firstDeferredObject});    }
    public class MyDeferredObject implements DeferredObject    {        private Object value;
        public MyDeferredObject(Object value) {            this.value = value;        }
        @Override        public Object get() throws HiveException        {            return value;        }    }
    private class MyListObjectInspector implements ListObjectInspector    {        @Override        public ObjectInspector getListElementObjectInspector()        {            return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldObjectInspectors);        }
        @Override        public Object getListElement(Object data, int index)        {            List myList = (List) data;            if (myList == null || index > myList.size()) {                return null;            }            return myList.get(index);        }
        @Override        public int getListLength(Object data)        {            if (data == null) {                return -1;            }            return ((List) data).size();        }
        @Override        public List<?> getList(Object data)        {            return (List) data;        }
        @Override        public String getTypeName()        {            return null;  //To change body of implemented methods use File | Settings | File Templates.        }
        @Override        public Category getCategory()        {            return Category.LIST;        }    }}
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: A GenericUDF Function to Extract a Field From an Array of Structs
Date: Thu, 28 Mar 2013 14:16:33 -0700
I am trying to write a GenericUDF function to collect all of a specific struct field(s) within an array for each record, and return them in an array as well.
I wrote the UDF (as below), and it seems to work but:
1) It does not work when I am performing this on an external table, it works fine on a managed table, any idea?
2) I am having a tough time writing a test on this.  I have attached the test I have so far, and it does not work, always getting 'java.util.ArrayList cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or cannot cast String to LazyString', my question is how do I supply a list of structs for the evalue method?
Any help will be greatly appreciated.
Thanks,Peter
The table:
CREATE EXTERNAL TABLE FOO (    TS string,    customerId string,    products array< struct<productCategory:string> >  )  PARTITIONED BY (ds string)  ROW FORMAT SERDE 'some.serde'  WITH SERDEPROPERTIES ('error.ignore'='true')  LOCATION 'some_locations'  ;
A row of record holds:1340321132000, 'some_company', [{"productCategory":"footwear"},{"productCategory":"eyewear"}]
This is my code:
import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.serde2.lazy.LazyString;import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.StructField;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;import org.apache.hadoop.io.Text;
import java.util.ArrayList;
@Description(name = "extract_product_category",        value = "_FUNC_( array< struct<productCategory:string> > ) - Collect all product category field values inside an array of struct(s), and return the results in an array<string>",        extended = "Example:\n SELECT _FUNC_(array_of_structs_with_product_catego
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB