Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: AvroStorage schema_uri pointing to local file doesn't work


Copy link to this message
-
Re: AvroStorage schema_uri pointing to local file doesn't work
Thank you, Cheolsoo!

Ok, I'll have Pig 0.12 when my team upgrades to a newer CDH.
For now I am using this workaround:
%declare WORK_DIR `pwd`
%declare SCHEMA_LITERAL `cat $WORK_DIR/schema.avsc`
...
STORE inputs INTO 'output'
    USING com.magnetic.org.apache.pig.piggybank.storage.avro.AvroStorage('{
    "index" : 1,
    "schema": $SCHEMA_LITERAL}');

Best Regards,
Ruslan Al-Fakikh
On Wed, Dec 25, 2013 at 11:48 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> avro to bcc:
>
> >> Why can't it use the schema file from front-end invocation?
>
> You're right. It should load the schema file in the front-end and pass it
> to the back-end via properties. Unfortunately, Piggybank AvroStorage
> doesn't do this.
>
> However, the new built-in AvroStorage in Pig 0.12 does exactly what you
> want. Can you use it instead?
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120
>
>
> On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > Hey guys,
> >
> > I am using AvroStorage like this:
> >
> > STORE alias INTO '$OUTPUT'
> >     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> >     "index" : 1,
> >     "schema_uri": "file://path/schema.avsc"}');
> >
> > so, it is explicit to take the schema.avsc from the local file system,
> not
> > HDFS.
> > It works in a pseudo-distributed cluster, but fails on a normal cluster
> > with java.io.FileNotFoundException for the schema file
> > Looks like this is happening in backend.
> > I assume this is because the backend invocation of AvroStorage on a node,
> > different from the node I am running the pig script from, cannot find the
> > file in the local file system.
> > Why can't it use the schema file from front-end invocation?
> > Does it mean that I am only limited to either HDFS locations for
> > schema_uri or using embedding the schema string in AvroStorage
> parameters?
> >
> > Thanks in advance
> >
> > Ruslan Al-Fakikh
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB