Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Efficient load for data with large number of columns

Copy link to this message
Re: Efficient load for data with large number of columns
thank you Marc and Markos,
this worked well.

i did a store to figure out how to write the schema in json and then used
that as a template to create a schema for load.

from my experiments, for data with three columns (int, charray, float) i
figured this is the minimal schema

is there any literature on how to write proper json for schemas?


On Wed, Mar 27, 2013 at 10:16 AM, MARCOS MEDRADO RUBINELLI <

> suppose my data has 100 columns or fields, and i want to impose a schema.
> is there a way i can create a separate file describing the schema of these
> fields, and let PIG read the schema from that file?
> Yes, if you put a json file named " .pig_schema" in the same directory as
> your data, Pig will use it to determine the schema:
> http://pig.apache.org/docs/r0.10.0/func.html#pigstorage
> Regards,
> Marcos