Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Efficient load for data with large number of columns


Copy link to this message
-
Re: Efficient load for data with large number of columns
Yes, as of Pig 0.10.0 you can specify a schema file along with PigStorage
when loading or storing data see
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html
.
On Wed, Mar 27, 2013 at 9:30 AM, Vadi Hombal <[EMAIL PROTECTED]> wrote:

> suppose my data has 100 columns or fields, and i want to impose a schema.
> is there a way i can create a separate file describing the schema of these
> fields, and let PIG read the schema from that file?
>
> for example.
> instead of  verbose typing in the pigscript....
> A = load mydata as (c1:int, c2:chararray, ...... ,c100:charaaray)
>
> can i do something like.
> A = load mydata as described in myschema.txt
>
> myschema.txt would be something like
> c1: int
> c2: chararray
> ....
> ....
> c100: chararray
>
> thanks
> vkh
>

--
Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248
e: [EMAIL PROTECTED]