Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Efficient load for data with large number of columns


Copy link to this message
-
Re: Efficient load for data with large number of columns
thank you Marc and Markos,
this worked well.

i did a store to figure out how to write the schema in json and then used
that as a template to create a schema for load.

from my experiments, for data with three columns (int, charray, float) i
figured this is the minimal schema
{"fields":
  [
    {"name":"year","type":10},
    {"name":"name","type":55},
    {"name":"num","type":20}
  ]
}

is there any literature on how to write proper json for schemas?

thanks
vkh

On Wed, Mar 27, 2013 at 10:16 AM, MARCOS MEDRADO RUBINELLI <
[EMAIL PROTECTED]> wrote:

> suppose my data has 100 columns or fields, and i want to impose a schema.
> is there a way i can create a separate file describing the schema of these
> fields, and let PIG read the schema from that file?
>
>
> Yes, if you put a json file named " .pig_schema" in the same directory as
> your data, Pig will use it to determine the schema:
>
> http://pig.apache.org/docs/r0.10.0/func.html#pigstorage
>
> Regards,
> Marcos
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB