Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Using AvroStorage()


Copy link to this message
-
Re: Using AvroStorage()
The following test script works for me:
============================================
A = load '$LOGS' using org.apache.pig.piggybank.storage.avro.AvroStorage();
describe A;

B = foreach A generate region as my_region, google_ip;

dump B;

store B into './output' using org.apache.pig.piggybank.storage.avro.AvroStorage(
'{"debug": 5,
  "schema": {"type": "record", "name": "test", "fields": [{"name":
"my_region", "type": ["null", "string"]}, {"name": "ip", "type":
["null", "string"]}]}
}');
============================================================Note you don't need to pass the first parameter, i.e., 'schema'; you
can just pass a string formatted in json.
If you're still getting MismatchException, please compile a small
repro and send it to the list.

stan

On Tue, Dec 13, 2011 at 5:49 AM, IGZ Nick <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I want to keep the pig script and storage schema separate. Is it possible
> to do this in a clean way? THe only way that has worked so far is to do
> like:
> AvroStorage('schema',
> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}');
>
> That too, all the schema in one line. If I split it onto multiple lines, I
> get a MismatchException (93-3) or something like that. Is there no way to
> do AvroStorage('file', <hdfs path of schema file>) or something of that
> sort, or at least be able to specify the schema in multiple lines?
>
> Thanks,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB