Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Reading Avro files with Pig


Copy link to this message
-
Reading Avro files with Pig
Hi,

I'm trying to read the Avro file i stored on HDFS, but I seem to be
hitting a snag. I'm hoping some of you will be able to shed some light
on this and allow me to continue my adventure!
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

avro = load '/import/2012-01-04-deflate.avro' USING AvroStorage();

groups = group avro by trace.terminalid;
sc = foreach groups generate group as terminalid, COUNT(avro) as cnt;

store sc into '/import/test-out.avro' USING AvroStorage();

The schema of the avro file:

{
     "type": "record",
     "name": "trace",
     "namespace": "asp",
     "fields": [
         {   "name": "id"   , "type": "long"   },
         {   "name": "timestamp"    , "type": "long"      },
         {   "name": "terminalid", "type": "int"   },
         {   "name": "creationtime", "type": "long"   },
         {   "name": "tracetype", "type": "int"   },
         {   "name": "traceproperties", "type": {
                 "type": "array",
                 "items": {
                     "name": "traceproperty",
                     "type": "record",
                     "fields": [
                         {    "name": "id", "type": "long"    },
                         {    "name": "value", "type": "string"    },
                         {    "name": "pkey", "type": "string"    },
                         {    "name": "traceid", "type": "long"    }
                     ]
                 }
             }
         }
     ]
}
The script above gives me:

<file avro-test.pig, line 9, column 28> Invalid field reference.
Referenced field [terminalid] does not exist in schema: .

So I guess I'm missing the point on how to interface with the schema
here?

Thanks in advance!

Kind regards,

Bart
+
Cheolsoo Park 2012-11-19, 18:36
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB