Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Efficient load for data with large number of columns

Copy link to this message
Re: Efficient load for data with large number of columns
i did a store to figure out how to write the schema in json and then used
that as a template to create a schema for load.

from my experiments, for data with three columns (int, charray, float) i
figured this is the minimal schema

is there any literature on how to write proper json for schemas?


Sadly, there isn't. For a simple, flat schema, it isn't hard. You just have to add another field, with its name, and corresponding DataType:

For a more complex schema, it's easier to actually construct a ResourceSchema object and serialize it with Jackson: