Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: STORE USING AvroStorage - ignores Pig field names, only using their position


Copy link to this message
-
Re: STORE USING AvroStorage - ignores Pig field names, only using their position
Unfortunately avro storage is not flexible enough as it could be.
I do keep avro schemas separatly on hdfs and use pointer to file in
AvroStorage for storing.
I do always do explicit projection of relation fields before storing
relation.

The same problem is for reading data. pushing up fields by names is not
working also.
2013/11/17 Russell Jurney <[EMAIL PROTECTED]>

> I think the expected behavior of AvroStorage is to use the tuple-ordered
> fields in the order they exist in the tuple. So to fix your problem, swap
> the order of b/nonsense_name.
>
> Otherwise I can't see a way to map from b to nonsense_name at all. Pig
> can't know how to do that without referencing tuple field order.
>
> On Sat, Nov 16, 2013 at 7:42 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > including this last message to pig user list
> >
> >
> > On Sun, Nov 17, 2013 at 7:40 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
> >
> >> Russel,
> >>
> >> Actually this problem came from the situation when I had the same names
> >> in pig relation schema and avro schema. And it turned out that
> AvroStorage
> >> switches fields if the order is different.
> >> So, my impression is that it should work this way:
> >> 1) names correspond - then AvroStorage uses them
> >> 2) names do not correspond - then AvroStorage fails to store or does
> some
> >> schema resolution as shown here:
> >> http://avro.apache.org/docs/1.7.5/spec.html#Schema+Resolution
> >>
> >> Thanks
> >>
> >>
> >> On Sun, Nov 17, 2013 at 7:17 AM, Russell Jurney <
> [EMAIL PROTECTED]
> >> > wrote:
> >>
> >>> How can pig map from a to nonsence_name?
> >>>
> >>>
> >>> On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote:
> >>>
> >>>> Thanks, Russel!
> >>>>
> >>>> Do you mean that this is the expected behavior? Shouldn't AvroStorage
> >>>> map the pig fields by their names (not their field order) matching
> them to
> >>>> the names in the avro schema?
> >>>>
> >>>> Thanks,
> >>>> Ruslan Al-Fakikh
> >>>>
> >>>>
> >>>> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney <
> >>>> [EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> Pig tuples have field order. Swap the order of the fields in your
> avro
> >>>>> schema and try again.
> >>>>>
> >>>>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>
> >>>>> wrote:
> >>>>>
> >>>>>  Hey guys,
> >>>>>
> >>>>> When I store with AvroStorage, the names from Pig tuple fields are
> >>>>> completely ignored. The field values are put to the result file only
> by
> >>>>> their position.
> >>>>> Here is a simplified test case:
> >>>>>
> >>>>> %declare WORKDIR `pwd`
> >>>>> REGISTER ../../../../lib/external/avro-1.7.4.jar
> >>>>> REGISTER ../../../../lib/external/json-simple-1.1.jar
> >>>>> --this is build (manually with Maven) from the latest source:
> >>>>> --
> >>>>>
> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
> >>>>> REGISTER ../piggybankBuiltFromSource.jar
> >>>>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
> >>>>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar
> >>>>>
> >>>>> --$ cat input.txt
> >>>>> --data_a data_b
> >>>>> --data_a data_b
> >>>>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);
> >>>>>
> >>>>> DESCRIBE inputs;
> >>>>> DUMP inputs;
> >>>>>
> >>>>> --output:
> >>>>> --inputs: {a: chararray,b: chararray}
> >>>>> --(data_a,data_b)
> >>>>> --(data_a,data_b)
> >>>>>
> >>>>> STORE inputs INTO 'output'
> >>>>>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> >>>>> "schema":
> >>>>> {
> >>>>>   "type" : "record",
> >>>>>   "name" : "my_schema",
> >>>>>   "namespace" : "com.my_namespace",
> >>>>>   "fields" : [
> >>>>>   {
> >>>>>     "name" : "b",
> >>>>>     "type" : "string"
> >>>>>   },
> >>>>>   {
> >>>>>     "name" : "nonsense_name",
> >>>>>     "type" : "string"
> >>>>>   }
> >>>>>   ]
> >>>>> }
> >>>>> }');
> >>>>>
> >>>>> --output
> >>>>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB