Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Using AvroStorage()


Copy link to this message
-
Re: Using AvroStorage()
Stan Rosenberg 2011-12-13, 15:35
The following test script works for me:
============================================
A = load '$LOGS' using org.apache.pig.piggybank.storage.avro.AvroStorage();
describe A;

B = foreach A generate region as my_region, google_ip;

dump B;

store B into './output' using org.apache.pig.piggybank.storage.avro.AvroStorage(
'{"debug": 5,
  "schema": {"type": "record", "name": "test", "fields": [{"name":
"my_region", "type": ["null", "string"]}, {"name": "ip", "type":
["null", "string"]}]}
}');
============================================================Note you don't need to pass the first parameter, i.e., 'schema'; you
can just pass a string formatted in json.
If you're still getting MismatchException, please compile a small
repro and send it to the list.

stan

On Tue, Dec 13, 2011 at 5:49 AM, IGZ Nick <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I want to keep the pig script and storage schema separate. Is it possible
> to do this in a clean way? THe only way that has worked so far is to do
> like:
> AvroStorage('schema',
> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}');
>
> That too, all the schema in one line. If I split it onto multiple lines, I
> get a MismatchException (93-3) or something like that. Is there no way to
> do AvroStorage('file', <hdfs path of schema file>) or something of that
> sort, or at least be able to specify the schema in multiple lines?
>
> Thanks,