Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Efficient load for data with large number of columns


+
Vadi Hombal 2013-03-27, 13:30
+
MARCOS MEDRADO RUBINELLI 2013-03-27, 14:16
+
Vadi Hombal 2013-03-28, 01:44
+
MARCOS MEDRADO RUBINELLI 2013-03-28, 10:50
Copy link to this message
-
Re: Efficient load for data with large number of columns
Yes, as of Pig 0.10.0 you can specify a schema file along with PigStorage
when loading or storing data see
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html
.
On Wed, Mar 27, 2013 at 9:30 AM, Vadi Hombal <[EMAIL PROTECTED]> wrote:

> suppose my data has 100 columns or fields, and i want to impose a schema.
> is there a way i can create a separate file describing the schema of these
> fields, and let PIG read the schema from that file?
>
> for example.
> instead of  verbose typing in the pigscript....
> A = load mydata as (c1:int, c2:chararray, ...... ,c100:charaaray)
>
> can i do something like.
> A = load mydata as described in myschema.txt
>
> myschema.txt would be something like
> c1: int
> c2: chararray
> ....
> ....
> c100: chararray
>
> thanks
> vkh
>

--
Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248
e: [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB