Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Efficient load for data with large number of columns


Copy link to this message
-
Re: Efficient load for data with large number of columns
thank you Marc and Markos,
this worked well.

i did a store to figure out how to write the schema in json and then used
that as a template to create a schema for load.

from my experiments, for data with three columns (int, charray, float) i
figured this is the minimal schema
{"fields":
  [
    {"name":"year","type":10},
    {"name":"name","type":55},
    {"name":"num","type":20}
  ]
}

is there any literature on how to write proper json for schemas?

thanks
vkh

On Wed, Mar 27, 2013 at 10:16 AM, MARCOS MEDRADO RUBINELLI <
[EMAIL PROTECTED]> wrote:

> suppose my data has 100 columns or fields, and i want to impose a schema.
> is there a way i can create a separate file describing the schema of these
> fields, and let PIG read the schema from that file?
>
>
> Yes, if you put a json file named " .pig_schema" in the same directory as
> your data, Pig will use it to determine the schema:
>
> http://pig.apache.org/docs/r0.10.0/func.html#pigstorage
>
> Regards,
> Marcos
>