|
praveenesh kumar
2012-02-03, 12:35
Stan Rosenberg
2012-02-03, 22:32
praveenesh kumar
2012-02-03, 22:35
Stan Rosenberg
2012-02-03, 22:42
praveenesh kumar
2012-02-03, 22:45
Stan Rosenberg
2012-02-04, 02:40
praveenesh kumar
2012-02-04, 06:48
Dmitriy Ryaboy
2012-02-06, 06:25
praveenesh kumar
2012-02-06, 06:35
Dmitriy Ryaboy
2012-02-06, 06:48
praveenesh kumar
2012-02-06, 06:59
Dmitriy Ryaboy
2012-02-06, 09:11
praveenesh kumar
2012-02-06, 09:17
Dmitriy Ryaboy
2012-02-06, 21:20
|
-
Passing schema inside Load functioncpraveenesh kumar 2012-02-03, 12:35
Hey guys,
I am new to Pig. I was wondering is it possible to pass schema in pig load statement while loading it first time. Suppose if I have a huge dataset.. containing around 100 cols.. Is there a way through which I can pass the schema defined in some other file (some kind of meta file) into pig load statement or do I have to define it every time inside LOAD statement ? Thanks, Praveenesh +
praveenesh kumar 2012-02-03, 12:35
-
Re: Passing schema inside Load functioncStan Rosenberg 2012-02-03, 22:32
My hunch is you'll have to write a custom loader, but I'll let the
experts chime in. E.g., AvroStorage loader can parse the schema from a json file passed to it via the constructor. I don't think PigStorage has the same option. stan On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar <[EMAIL PROTECTED]> wrote: > Hey guys, > > I am new to Pig. > I was wondering is it possible to pass schema in pig load statement while > loading it first time. > > Suppose if I have a huge dataset.. containing around 100 cols.. Is there a > way through which I can pass the schema defined in some other file (some > kind of meta file) into pig load statement or do I have to define it every > time inside LOAD statement ? > > Thanks, > Praveenesh +
Stan Rosenberg 2012-02-03, 22:32
-
Re: Passing schema inside Load functioncpraveenesh kumar 2012-02-03, 22:35
Thanks Stan,
If you were facing this kind of scenario, how would you have proceeded ? Can you give me some pointers on how to write custom loader, some good tutorials..on it What is the current practice in order to solve the above scenario in pig ? Praveenesh On Sat, Feb 4, 2012 at 4:02 AM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > My hunch is you'll have to write a custom loader, but I'll let the > experts chime in. E.g., AvroStorage loader can parse the schema > from a json file passed to it via the constructor. I don't think > PigStorage has the same option. > > stan > > On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar <[EMAIL PROTECTED]> > wrote: > > Hey guys, > > > > I am new to Pig. > > I was wondering is it possible to pass schema in pig load statement while > > loading it first time. > > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is there > a > > way through which I can pass the schema defined in some other file (some > > kind of meta file) into pig load statement or do I have to define it > every > > time inside LOAD statement ? > > > > Thanks, > > Praveenesh > +
praveenesh kumar 2012-02-03, 22:35
-
Re: Passing schema inside Load functioncStan Rosenberg 2012-02-03, 22:42
Hi Praveenesh,
Assuming you have already read these: http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html http://pig.apache.org/docs/r0.9.2/udf.html#load-store-functions my next step would be to peruse the source code of some existing loaders, e.g., PigStorage. Best, stan On Fri, Feb 3, 2012 at 5:35 PM, praveenesh kumar <[EMAIL PROTECTED]> wrote: > Thanks Stan, > If you were facing this kind of scenario, how would you have proceeded ? > Can you give me some pointers on how to write custom loader, some good > tutorials..on it > What is the current practice in order to solve the above scenario in pig ? > > Praveenesh > > > On Sat, Feb 4, 2012 at 4:02 AM, Stan Rosenberg < > [EMAIL PROTECTED]> wrote: > >> My hunch is you'll have to write a custom loader, but I'll let the >> experts chime in. E.g., AvroStorage loader can parse the schema >> from a json file passed to it via the constructor. I don't think >> PigStorage has the same option. >> >> stan >> >> On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar <[EMAIL PROTECTED]> >> wrote: >> > Hey guys, >> > >> > I am new to Pig. >> > I was wondering is it possible to pass schema in pig load statement while >> > loading it first time. >> > >> > Suppose if I have a huge dataset.. containing around 100 cols.. Is there >> a >> > way through which I can pass the schema defined in some other file (some >> > kind of meta file) into pig load statement or do I have to define it >> every >> > time inside LOAD statement ? >> > >> > Thanks, >> > Praveenesh >> +
Stan Rosenberg 2012-02-03, 22:42
-
Re: Passing schema inside Load functioncpraveenesh kumar 2012-02-03, 22:45
Thanks Stan,
I was going through these only. I was wondering is there a easy way to do it or am I reading something wrong. Now I will focus on what you have suggested. but I hope there is some easy solution to my problem Praveenesh On Sat, Feb 4, 2012 at 4:12 AM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > Hi Praveenesh, > > Assuming you have already read these: > > http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html > http://pig.apache.org/docs/r0.9.2/udf.html#load-store-functions > > my next step would be to peruse the source code of some existing > loaders, e.g., PigStorage. > > Best, > > stan > > > On Fri, Feb 3, 2012 at 5:35 PM, praveenesh kumar <[EMAIL PROTECTED]> > wrote: > > Thanks Stan, > > If you were facing this kind of scenario, how would you have proceeded ? > > Can you give me some pointers on how to write custom loader, some good > > tutorials..on it > > What is the current practice in order to solve the above scenario in pig > ? > > > > Praveenesh > > > > > > On Sat, Feb 4, 2012 at 4:02 AM, Stan Rosenberg < > > [EMAIL PROTECTED]> wrote: > > > >> My hunch is you'll have to write a custom loader, but I'll let the > >> experts chime in. E.g., AvroStorage loader can parse the schema > >> from a json file passed to it via the constructor. I don't think > >> PigStorage has the same option. > >> > >> stan > >> > >> On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar <[EMAIL PROTECTED]> > >> wrote: > >> > Hey guys, > >> > > >> > I am new to Pig. > >> > I was wondering is it possible to pass schema in pig load statement > while > >> > loading it first time. > >> > > >> > Suppose if I have a huge dataset.. containing around 100 cols.. Is > there > >> a > >> > way through which I can pass the schema defined in some other file > (some > >> > kind of meta file) into pig load statement or do I have to define it > >> every > >> > time inside LOAD statement ? > >> > > >> > Thanks, > >> > Praveenesh > >> > +
praveenesh kumar 2012-02-03, 22:45
-
Re: Passing schema inside Load functioncStan Rosenberg 2012-02-04, 02:40
Hi Praveenesh,
Maybe this will get you started. Suppose we have the desired schema parsed and stored in 'map' of type LinkedHashMap<String, String>. The key is your field name, and the value denotes the data type, e.g., 'string', 'int', etc. Now, let's derive pig's schema from this map: Schema schema = new Schema(); // pig schema for (Entry<String, String> entry : map.entrySet()) { schema.add(new Schema.FieldSchema(entry.getKey(), getPigType(entry.getValue()))); } where getPigType returns the corresponding pig's data type: byte getPigType(String fieldType) { if (fieldType.equalsIgnoreCase("string")) { return DataType.CHARARRAY; } else if (fieldType.equalsIgnoreCase("int")) { return DataType.INTEGER; } else if (fieldType.equalsIgnoreCase("long")) { return DataType.LONG; } else if (fieldType.equalsIgnoreCase("float")) { return DataType.FLOAT; } if (fieldType.equalsIgnoreCase("double")) { return DataType.DOUBLE; } if (fieldType.equalsIgnoreCase("boolean")) { return DataType.BOOLEAN; } else { return DataType.CHARARRAY; } } Now, you'll want to implement 'getSchema' in your custom loader: @Override public ResourceSchema getSchema(String location, Job job) throws IOException { return new ResourceSchema(schema); // I'd actually cache this result if the schema is fixed } This should take care of the schema except you'd probably also need to serialize it to the back-end so that you can enforce the schema inside 'getNext'. stan P.S. The above is essentially pseudo-code; I haven't actually type-checked it. On Fri, Feb 3, 2012 at 5:45 PM, praveenesh kumar <[EMAIL PROTECTED]> wrote: > Thanks Stan, > I was going through these only. I was wondering is there a easy way to do > it or am I reading something wrong. > Now I will focus on what you have suggested. but I hope there is some easy > solution to my problem > > Praveenesh > > On Sat, Feb 4, 2012 at 4:12 AM, Stan Rosenberg < > [EMAIL PROTECTED]> wrote: > >> Hi Praveenesh, >> >> Assuming you have already read these: >> >> http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html >> http://pig.apache.org/docs/r0.9.2/udf.html#load-store-functions >> >> my next step would be to peruse the source code of some existing >> loaders, e.g., PigStorage. >> >> Best, >> >> stan >> >> >> On Fri, Feb 3, 2012 at 5:35 PM, praveenesh kumar <[EMAIL PROTECTED]> >> wrote: >> > Thanks Stan, >> > If you were facing this kind of scenario, how would you have proceeded ? >> > Can you give me some pointers on how to write custom loader, some good >> > tutorials..on it >> > What is the current practice in order to solve the above scenario in pig >> ? >> > >> > Praveenesh >> > >> > >> > On Sat, Feb 4, 2012 at 4:02 AM, Stan Rosenberg < >> > [EMAIL PROTECTED]> wrote: >> > >> >> My hunch is you'll have to write a custom loader, but I'll let the >> >> experts chime in. E.g., AvroStorage loader can parse the schema >> >> from a json file passed to it via the constructor. I don't think >> >> PigStorage has the same option. >> >> >> >> stan >> >> >> >> On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar <[EMAIL PROTECTED]> >> >> wrote: >> >> > Hey guys, >> >> > >> >> > I am new to Pig. >> >> > I was wondering is it possible to pass schema in pig load statement >> while >> >> > loading it first time. >> >> > >> >> > Suppose if I have a huge dataset.. containing around 100 cols.. Is >> there >> >> a >> >> > way through which I can pass the schema defined in some other file >> (some >> >> > kind of meta file) into pig load statement or do I have to define it >> >> every >> >> > time inside LOAD statement ? >> >> > >> >> > Thanks, >> >> > Praveenesh >> >> >> +
Stan Rosenberg 2012-02-04, 02:40
-
Re: Passing schema inside Load functioncpraveenesh kumar 2012-02-04, 06:48
Thanks Stan,
This would be a great help.. !! I'll try to implement it. :-) Praveenesh On Sat, Feb 4, 2012 at 8:10 AM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > Hi Praveenesh, > > Maybe this will get you started. > > Suppose we have the desired schema parsed and stored in 'map' of type > LinkedHashMap<String, String>. The key is your field name, and the > value denotes the data type, e.g., 'string', 'int', > etc. > > Now, let's derive pig's schema from this map: > > Schema schema = new Schema(); // pig schema > > for (Entry<String, String> entry : map.entrySet()) { > schema.add(new Schema.FieldSchema(entry.getKey(), > getPigType(entry.getValue()))); > } > > where getPigType returns the corresponding pig's data type: > > byte getPigType(String fieldType) { > if (fieldType.equalsIgnoreCase("string")) { > return DataType.CHARARRAY; > } else if (fieldType.equalsIgnoreCase("int")) { > return DataType.INTEGER; > } else if (fieldType.equalsIgnoreCase("long")) { > return DataType.LONG; > } else if (fieldType.equalsIgnoreCase("float")) { > return DataType.FLOAT; > } if (fieldType.equalsIgnoreCase("double")) { > return DataType.DOUBLE; > } if (fieldType.equalsIgnoreCase("boolean")) { > return DataType.BOOLEAN; > } else { > return DataType.CHARARRAY; > } > } > > > Now, you'll want to implement 'getSchema' in your custom loader: > > @Override > public ResourceSchema getSchema(String location, Job job) throws > IOException { > return new ResourceSchema(schema); // I'd actually cache this > result if the schema is fixed > } > > This should take care of the schema except you'd probably also need to > serialize it to the back-end so that > you can enforce the schema inside 'getNext'. > > stan > > P.S. The above is essentially pseudo-code; I haven't actually type-checked > it. > > On Fri, Feb 3, 2012 at 5:45 PM, praveenesh kumar <[EMAIL PROTECTED]> > wrote: > > Thanks Stan, > > I was going through these only. I was wondering is there a easy way to do > > it or am I reading something wrong. > > Now I will focus on what you have suggested. but I hope there is some > easy > > solution to my problem > > > > Praveenesh > > > > On Sat, Feb 4, 2012 at 4:12 AM, Stan Rosenberg < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Praveenesh, > >> > >> Assuming you have already read these: > >> > >> http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html > >> http://pig.apache.org/docs/r0.9.2/udf.html#load-store-functions > >> > >> my next step would be to peruse the source code of some existing > >> loaders, e.g., PigStorage. > >> > >> Best, > >> > >> stan > >> > >> > >> On Fri, Feb 3, 2012 at 5:35 PM, praveenesh kumar <[EMAIL PROTECTED]> > >> wrote: > >> > Thanks Stan, > >> > If you were facing this kind of scenario, how would you have > proceeded ? > >> > Can you give me some pointers on how to write custom loader, some good > >> > tutorials..on it > >> > What is the current practice in order to solve the above scenario in > pig > >> ? > >> > > >> > Praveenesh > >> > > >> > > >> > On Sat, Feb 4, 2012 at 4:02 AM, Stan Rosenberg < > >> > [EMAIL PROTECTED]> wrote: > >> > > >> >> My hunch is you'll have to write a custom loader, but I'll let the > >> >> experts chime in. E.g., AvroStorage loader can parse the schema > >> >> from a json file passed to it via the constructor. I don't think > >> >> PigStorage has the same option. > >> >> > >> >> stan > >> >> > >> >> On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar < > [EMAIL PROTECTED]> > >> >> wrote: > >> >> > Hey guys, > >> >> > > >> >> > I am new to Pig. > >> >> > I was wondering is it possible to pass schema in pig load statement > >> while +
praveenesh kumar 2012-02-04, 06:48
-
Re: Passing schema inside Load functioncDmitriy Ryaboy 2012-02-06, 06:25
It's pretty straightforward, that's why the LoadMetadata interface exists.
You just have to implement it and translate however you store the schema to a Pig Schema object. PigStorageSchema will read a json file that describes the schema, you can look at how that's done there (actually, PigStorage itself will do that in trunk). You can also check out what the Elephant-Bird library does for loading protocol buffers and thrift objects, where schema is derived from the object itself. -Dmitriy On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar <[EMAIL PROTECTED]>wrote: > Hey guys, > > I am new to Pig. > I was wondering is it possible to pass schema in pig load statement while > loading it first time. > > Suppose if I have a huge dataset.. containing around 100 cols.. Is there a > way through which I can pass the schema defined in some other file (some > kind of meta file) into pig load statement or do I have to define it every > time inside LOAD statement ? > > Thanks, > Praveenesh > +
Dmitriy Ryaboy 2012-02-06, 06:25
-
Re: Passing schema inside Load functioncpraveenesh kumar 2012-02-06, 06:35
Thanks,
I was also looking for -schema option in PigStorage. But Can anyone explain how can we define that json schema file. Some tutorial/small example would be very helpful. Praveenesh On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > It's pretty straightforward, that's why the LoadMetadata interface exists. > You just have to implement it and translate however you store the schema to > a Pig Schema object. > > PigStorageSchema will read a json file that describes the schema, you can > look at how that's done there (actually, PigStorage itself will do that in > trunk). > > You can also check out what the Elephant-Bird library does for loading > protocol buffers and thrift objects, where schema is derived from the > object itself. > > -Dmitriy > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar <[EMAIL PROTECTED] > >wrote: > > > Hey guys, > > > > I am new to Pig. > > I was wondering is it possible to pass schema in pig load statement while > > loading it first time. > > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is there > a > > way through which I can pass the schema defined in some other file (some > > kind of meta file) into pig load statement or do I have to define it > every > > time inside LOAD statement ? > > > > Thanks, > > Praveenesh > > > +
praveenesh kumar 2012-02-06, 06:35
-
Re: Passing schema inside Load functioncDmitriy Ryaboy 2012-02-06, 06:48
It's a json serialization of the Pig schema object, and isn't really meant
to be created by hand. Patches to make it more human-friendly would be quite welcome. D On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: > Thanks, > I was also looking for -schema option in PigStorage. > But Can anyone explain how can we define that json schema file. > Some tutorial/small example would be very helpful. > > Praveenesh > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > It's pretty straightforward, that's why the LoadMetadata interface > exists. > > You just have to implement it and translate however you store the schema > to > > a Pig Schema object. > > > > PigStorageSchema will read a json file that describes the schema, you can > > look at how that's done there (actually, PigStorage itself will do that > in > > trunk). > > > > You can also check out what the Elephant-Bird library does for loading > > protocol buffers and thrift objects, where schema is derived from the > > object itself. > > > > -Dmitriy > > > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar <[EMAIL PROTECTED] > > >wrote: > > > > > Hey guys, > > > > > > I am new to Pig. > > > I was wondering is it possible to pass schema in pig load statement > while > > > loading it first time. > > > > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is > there > > a > > > way through which I can pass the schema defined in some other file > (some > > > kind of meta file) into pig load statement or do I have to define it > > every > > > time inside LOAD statement ? > > > > > > Thanks, > > > Praveenesh > > > > > > +
Dmitriy Ryaboy 2012-02-06, 06:48
-
Re: Passing schema inside Load functioncpraveenesh kumar 2012-02-06, 06:59
Okie.. so how can I make use of -schema option with PigStorage.
Suppose my Jscon schema is - { "name":"Student_Data", "properties": { "id": { "type":"INTEGER", "description":"Student id" }, "name": { "type":"CHARARRAY", "description":"Name of the student" }, "marks": { "type":"INTEGER", "description":"Marks of the student" }, } } I tried to create the above schema in Pig Datatypes. Can I use it or Is there a different way to use "-schema" option ? <code>-schema</code> Reads/Stores the schema of the relation using a hidden JSON file. Or is there some other way to directly pass the schema defined in some other file as plain text file and read it using PigStorage ? Thanks, Praveenesh On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > It's a json serialization of the Pig schema object, and isn't really meant > to be created by hand. > Patches to make it more human-friendly would be quite welcome. > > D > > On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar <[EMAIL PROTECTED] > >wrote: > > > Thanks, > > I was also looking for -schema option in PigStorage. > > But Can anyone explain how can we define that json schema file. > > Some tutorial/small example would be very helpful. > > > > Praveenesh > > > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > wrote: > > > > > It's pretty straightforward, that's why the LoadMetadata interface > > exists. > > > You just have to implement it and translate however you store the > schema > > to > > > a Pig Schema object. > > > > > > PigStorageSchema will read a json file that describes the schema, you > can > > > look at how that's done there (actually, PigStorage itself will do that > > in > > > trunk). > > > > > > You can also check out what the Elephant-Bird library does for loading > > > protocol buffers and thrift objects, where schema is derived from the > > > object itself. > > > > > > -Dmitriy > > > > > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Hey guys, > > > > > > > > I am new to Pig. > > > > I was wondering is it possible to pass schema in pig load statement > > while > > > > loading it first time. > > > > > > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is > > there > > > a > > > > way through which I can pass the schema defined in some other file > > (some > > > > kind of meta file) into pig load statement or do I have to define it > > > every > > > > time inside LOAD statement ? > > > > > > > > Thanks, > > > > Praveenesh > > > > > > > > > > +
praveenesh kumar 2012-02-06, 06:59
-
Re: Passing schema inside Load functioncDmitriy Ryaboy 2012-02-06, 09:11
it reads the schema file *it creates* . So, you process some data, store
it, then read it back later, and the schema is back. Like I said, the json is not very human-readable -- the types are integers rather than words like "chararray", etc. Try saving something and check out the .pig_schema file to see an example. D On Sun, Feb 5, 2012 at 10:59 PM, praveenesh kumar <[EMAIL PROTECTED]>wrote: > Okie.. so how can I make use of -schema option with PigStorage. > > Suppose my Jscon schema is - > > { > "name":"Student_Data", > "properties": > { > "id": > { > "type":"INTEGER", > "description":"Student id" > }, > "name": > { > "type":"CHARARRAY", > "description":"Name of the student" > > }, > "marks": > { > "type":"INTEGER", > "description":"Marks of the student" > }, > > } > } > > I tried to create the above schema in Pig Datatypes. Can I use it or Is > there a different way to use "-schema" option ? > <code>-schema</code> Reads/Stores the schema of the relation using a hidden > JSON file. > > Or is there some other way to directly pass the schema defined in some > other file as plain text file and read it using PigStorage ? > > Thanks, > Praveenesh > > > On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > It's a json serialization of the Pig schema object, and isn't really > meant > > to be created by hand. > > Patches to make it more human-friendly would be quite welcome. > > > > D > > > > On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar <[EMAIL PROTECTED] > > >wrote: > > > > > Thanks, > > > I was also looking for -schema option in PigStorage. > > > But Can anyone explain how can we define that json schema file. > > > Some tutorial/small example would be very helpful. > > > > > > Praveenesh > > > > > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > > wrote: > > > > > > > It's pretty straightforward, that's why the LoadMetadata interface > > > exists. > > > > You just have to implement it and translate however you store the > > schema > > > to > > > > a Pig Schema object. > > > > > > > > PigStorageSchema will read a json file that describes the schema, you > > can > > > > look at how that's done there (actually, PigStorage itself will do > that > > > in > > > > trunk). > > > > > > > > You can also check out what the Elephant-Bird library does for > loading > > > > protocol buffers and thrift objects, where schema is derived from the > > > > object itself. > > > > > > > > -Dmitriy > > > > > > > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > Hey guys, > > > > > > > > > > I am new to Pig. > > > > > I was wondering is it possible to pass schema in pig load statement > > > while > > > > > loading it first time. > > > > > > > > > > Suppose if I have a huge dataset.. containing around 100 cols.. Is > > > there > > > > a > > > > > way through which I can pass the schema defined in some other file > > > (some > > > > > kind of meta file) into pig load statement or do I have to define > it > > > > every > > > > > time inside LOAD statement ? > > > > > > > > > > Thanks, > > > > > Praveenesh > > > > > > > > > > > > > > > +
Dmitriy Ryaboy 2012-02-06, 09:11
-
Re: Passing schema inside Load functioncpraveenesh kumar 2012-02-06, 09:17
Yeah I tried that -
Here's what I get for a small sample data : { "fields": [ {"name":"name","type":55,"description":"autogenerated from Pig Field Schema","schema":null}, {"name":"age","type":10,"description":"autogenerated from Pig Field Schema","schema":null}, {"name":"gpa","type":20,"description":"autogenerated from Pig Field Schema","schema":null} ], "version":0, "sortKeys":[], "sortKeyOrders":[] } I am looking to see if I can decode this formats and try to define my own schema in this way and use it in PigLoader function Thanks, Praveenesh On Mon, Feb 6, 2012 at 2:41 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > it reads the schema file *it creates* . So, you process some data, store > it, then read it back later, and the schema is back. > Like I said, the json is not very human-readable -- the types are integers > rather than words like "chararray", etc. > Try saving something and check out the .pig_schema file to see an example. > > D > > On Sun, Feb 5, 2012 at 10:59 PM, praveenesh kumar <[EMAIL PROTECTED] > >wrote: > > > Okie.. so how can I make use of -schema option with PigStorage. > > > > Suppose my Jscon schema is - > > > > { > > "name":"Student_Data", > > "properties": > > { > > "id": > > { > > "type":"INTEGER", > > "description":"Student id" > > }, > > "name": > > { > > "type":"CHARARRAY", > > "description":"Name of the student" > > > > }, > > "marks": > > { > > "type":"INTEGER", > > "description":"Marks of the student" > > }, > > > > } > > } > > > > I tried to create the above schema in Pig Datatypes. Can I use it or Is > > there a different way to use "-schema" option ? > > <code>-schema</code> Reads/Stores the schema of the relation using a > hidden > > JSON file. > > > > Or is there some other way to directly pass the schema defined in some > > other file as plain text file and read it using PigStorage ? > > > > Thanks, > > Praveenesh > > > > > > On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > wrote: > > > > > It's a json serialization of the Pig schema object, and isn't really > > meant > > > to be created by hand. > > > Patches to make it more human-friendly would be quite welcome. > > > > > > D > > > > > > On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Thanks, > > > > I was also looking for -schema option in PigStorage. > > > > But Can anyone explain how can we define that json schema file. > > > > Some tutorial/small example would be very helpful. > > > > > > > > Praveenesh > > > > > > > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > It's pretty straightforward, that's why the LoadMetadata interface > > > > exists. > > > > > You just have to implement it and translate however you store the > > > schema > > > > to > > > > > a Pig Schema object. > > > > > > > > > > PigStorageSchema will read a json file that describes the schema, > you > > > can > > > > > look at how that's done there (actually, PigStorage itself will do > > that > > > > in > > > > > trunk). > > > > > > > > > > You can also check out what the Elephant-Bird library does for > > loading > > > > > protocol buffers and thrift objects, where schema is derived from > the > > > > > object itself. > > > > > > > > > > -Dmitriy > > > > > > > > > > On Fri, Feb 3, 2012 at 4:35 AM, praveenesh kumar < > > [EMAIL PROTECTED] > > > > > >wrote: > > > > > > > > > > > Hey guys, > > > > > > > > > > > > I am new to Pig. > > > > > > I was wondering is it possible to pass schema in pig load > statement > > > > while > > > > > > loading it first time. > > > > > > > > > +
praveenesh kumar 2012-02-06, 09:17
-
Re: Passing schema inside Load functioncDmitriy Ryaboy 2012-02-06, 21:20
The integer values for types come from org.apache.pig.data.DataType
On Mon, Feb 6, 2012 at 1:17 AM, praveenesh kumar <[EMAIL PROTECTED]>wrote: > Yeah I tried that - > Here's what I get for a small sample data : > > { > "fields": > [ > {"name":"name","type":55,"description":"autogenerated from > Pig Field Schema","schema":null}, > {"name":"age","type":10,"description":"autogenerated from > Pig Field Schema","schema":null}, > {"name":"gpa","type":20,"description":"autogenerated from > Pig Field Schema","schema":null} > ], > > "version":0, > "sortKeys":[], > "sortKeyOrders":[] > } > > > I am looking to see if I can decode this formats and try to define my own > schema in this way and use it in PigLoader function > > Thanks, > Praveenesh > > On Mon, Feb 6, 2012 at 2:41 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > > > it reads the schema file *it creates* . So, you process some data, store > > it, then read it back later, and the schema is back. > > Like I said, the json is not very human-readable -- the types are > integers > > rather than words like "chararray", etc. > > Try saving something and check out the .pig_schema file to see an > example. > > > > D > > > > On Sun, Feb 5, 2012 at 10:59 PM, praveenesh kumar <[EMAIL PROTECTED] > > >wrote: > > > > > Okie.. so how can I make use of -schema option with PigStorage. > > > > > > Suppose my Jscon schema is - > > > > > > { > > > "name":"Student_Data", > > > "properties": > > > { > > > "id": > > > { > > > "type":"INTEGER", > > > "description":"Student id" > > > }, > > > "name": > > > { > > > "type":"CHARARRAY", > > > "description":"Name of the student" > > > > > > }, > > > "marks": > > > { > > > "type":"INTEGER", > > > "description":"Marks of the student" > > > }, > > > > > > } > > > } > > > > > > I tried to create the above schema in Pig Datatypes. Can I use it or Is > > > there a different way to use "-schema" option ? > > > <code>-schema</code> Reads/Stores the schema of the relation using a > > hidden > > > JSON file. > > > > > > Or is there some other way to directly pass the schema defined in some > > > other file as plain text file and read it using PigStorage ? > > > > > > Thanks, > > > Praveenesh > > > > > > > > > On Mon, Feb 6, 2012 at 12:18 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > > wrote: > > > > > > > It's a json serialization of the Pig schema object, and isn't really > > > meant > > > > to be created by hand. > > > > Patches to make it more human-friendly would be quite welcome. > > > > > > > > D > > > > > > > > On Sun, Feb 5, 2012 at 10:35 PM, praveenesh kumar < > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > Thanks, > > > > > I was also looking for -schema option in PigStorage. > > > > > But Can anyone explain how can we define that json schema file. > > > > > Some tutorial/small example would be very helpful. > > > > > > > > > > Praveenesh > > > > > > > > > > On Mon, Feb 6, 2012 at 11:55 AM, Dmitriy Ryaboy < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > It's pretty straightforward, that's why the LoadMetadata > interface > > > > > exists. > > > > > > You just have to implement it and translate however you store the > > > > schema > > > > > to > > > > > > a Pig Schema object. > > > > > > > > > > > > PigStorageSchema will read a json file that describes the schema, > > you > > > > can > > > > > > look at how that's done there (actually, PigStorage itself will > do > > > that > > > > > in > > > > > > trunk). > > > > > > > > > > > > You can also check out what the Elephant-Bird library does for > > > loading > > > > > > protocol buffers and thrift objects, where schema is derived from > > the +
Dmitriy Ryaboy 2012-02-06, 21:20
|