|
|
-
Json and split into multiple files
Mohit Anchlia 2012-09-05, 03:37
I have a Json something like:
{ user{ id : 1 name: user1 } product { id: 1 name: product1 } }
I want to be able to read this file and create 2 files as follows:
user file: key,1,user1
product file: key,1,product1
I know I need to call exec but the method will return Bags for each of these dimensions. But since it's all unordered how do I split it further to write them to separate files?
-
Re: Json and split into multiple files
Mohit Anchlia 2012-09-05, 19:04
Any pointers would be appreciated
On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> I have a Json something like: > > { > user{ > id : 1 > name: user1 > } > product { > id: 1 > name: product1 > } > } > > I want to be able to read this file and create 2 files as follows: > > user file: > key,1,user1 > > product file: > key,1,product1 > > I know I need to call exec but the method will return Bags for each of > these dimensions. But since it's all unordered how do I split it further > to write them to separate files? >
-
Re: Json and split into multiple files
Alan Gates 2012-09-06, 15:21
Loading the JSON below should give you a Pig record like: (user: tuple(id: int, name: chararray), product: tuple(id: int, name:chararray))
In that case your Pig Latin would look like:
A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name: chararray), product: tuple(id: int, name:chararray)) B = foreach A generate user.id, user.name; store B into 'userfile'; C = foreach A generate product.id, product.name; store C info 'productfile';
I'm not sure what key is, so I'm not sure the above is what you're thinking or not.
Alan.
On Sep 5, 2012, at 12:04 PM, Mohit Anchlia wrote:
> Any pointers would be appreciated > > On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> I have a Json something like: >> >> { >> user{ >> id : 1 >> name: user1 >> } >> product { >> id: 1 >> name: product1 >> } >> } >> >> I want to be able to read this file and create 2 files as follows: >> >> user file: >> key,1,user1 >> >> product file: >> key,1,product1 >> >> I know I need to call exec but the method will return Bags for each of >> these dimensions. But since it's all unordered how do I split it further >> to write them to separate files? >>
-
Re: Json and split into multiple files
Mohit Anchlia 2012-09-06, 16:32
My real life Json is much more complicated and I will have to use exec method. But I was wondering how do I reference a Bag related to user and all it's fields when it gets returned from the exec call?
On Thu, Sep 6, 2012 at 8:21 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
> Loading the JSON below should give you a Pig record like: > (user: tuple(id: int, name: chararray), product: tuple(id: int, > name:chararray)) > > In that case your Pig Latin would look like: > > A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name: > chararray), product: tuple(id: int, name:chararray)) > B = foreach A generate user.id, user.name; > store B into 'userfile'; > C = foreach A generate product.id, product.name; > store C info 'productfile'; > > I'm not sure what key is, so I'm not sure the above is what you're > thinking or not. > > Alan. > > On Sep 5, 2012, at 12:04 PM, Mohit Anchlia wrote: > > > Any pointers would be appreciated > > > > On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > >> I have a Json something like: > >> > >> { > >> user{ > >> id : 1 > >> name: user1 > >> } > >> product { > >> id: 1 > >> name: product1 > >> } > >> } > >> > >> I want to be able to read this file and create 2 files as follows: > >> > >> user file: > >> key,1,user1 > >> > >> product file: > >> key,1,product1 > >> > >> I know I need to call exec but the method will return Bags for each of > >> these dimensions. But since it's all unordered how do I split it > further > >> to write them to separate files? > >> > >
-
Re: Json and split into multiple files
Mohit Anchlia 2012-09-07, 17:21
It looks like I can use outputSchema(Schema input) call to do this. But examples I see are only for one tuple. In my case if I am reading it right I need tuple for each dimension and hence schema for each. For instance there'll be one user tuple and then product tuple for instance. So I need schema for each.
How can I do this using outputSchema such that result is like below where I can access each tuple and field that is a named field? Thanks for your help
A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name: chararray), product: tuple(id: int, name:chararray))
On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> I have a Json something like: > > { > user{ > id : 1 > name: user1 > } > product { > id: 1 > name: product1 > } > } > > I want to be able to read this file and create 2 files as follows: > > user file: > key,1,user1 > > product file: > key,1,product1 > > I know I need to call exec but the method will return Bags for each of > these dimensions. But since it's all unordered how do I split it further > to write them to separate files? >
-
Re: Json and split into multiple files
Alan Gates 2012-09-13, 02:51
I don't understand your use case or why you need to use exec or outputSchema. Would it be possible to send a more complete example that makes clear why you need these?
Alan.
A tuple can contain a tuple, so it's certainly possible with outputSchema() to generate a schema that declares both your tuples. But I don't think this answers your questions.
On Sep 7, 2012, at 10:21 AM, Mohit Anchlia wrote:
> It looks like I can use outputSchema(Schema input) call to do this. But > examples I see are only for one tuple. In my case if I am reading it right > I need tuple for each dimension and hence schema for each. For instance > there'll be one user tuple and then product tuple for instance. So I need > schema for each. > > How can I do this using outputSchema such that result is like below where I > can access each tuple and field that is a named field? Thanks for your help > > A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name: > chararray), product: tuple(id: int, name:chararray)) > > On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> I have a Json something like: >> >> { >> user{ >> id : 1 >> name: user1 >> } >> product { >> id: 1 >> name: product1 >> } >> } >> >> I want to be able to read this file and create 2 files as follows: >> >> user file: >> key,1,user1 >> >> product file: >> key,1,product1 >> >> I know I need to call exec but the method will return Bags for each of >> these dimensions. But since it's all unordered how do I split it further >> to write them to separate files? >>
-
Re: Json and split into multiple files
Mohit Anchlia 2012-09-13, 14:01
On Wed, Sep 12, 2012 at 7:51 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> I don't understand your use case or why you need to use exec or > outputSchema. Would it be possible to send a more complete example that > makes clear why you need these? > > My Json has many fields and several parent elements. I already have POJO that I can parse into and read fields from instead of hand typing all of them. I also have a mapper and formatter that maps JSON to database fields which is a fixed position in file. Hand typing all of it in Pig would be really painful. With exec I can easily parse my Json and then use Mappers to write to Tuples. It's faster to develop and easy to unit test. > Alan. > > A tuple can contain a tuple, so it's certainly possible with > outputSchema() to generate a schema that declares both your tuples. But I > don't think this answers your questions. > > On Sep 7, 2012, at 10:21 AM, Mohit Anchlia wrote: > > > It looks like I can use outputSchema(Schema input) call to do this. But > > examples I see are only for one tuple. In my case if I am reading it > right > > I need tuple for each dimension and hence schema for each. For instance > > there'll be one user tuple and then product tuple for instance. So I need > > schema for each. > > > > How can I do this using outputSchema such that result is like below > where I > > can access each tuple and field that is a named field? Thanks for your > help > > > > A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name: > > chararray), product: tuple(id: int, name:chararray)) > > > > On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > >> I have a Json something like: > >> > >> { > >> user{ > >> id : 1 > >> name: user1 > >> } > >> product { > >> id: 1 > >> name: product1 > >> } > >> } > >> > >> I want to be able to read this file and create 2 files as follows: > >> > >> user file: > >> key,1,user1 > >> > >> product file: > >> key,1,product1 > >> > >> I know I need to call exec but the method will return Bags for each of > >> these dimensions. But since it's all unordered how do I split it > further > >> to write them to separate files? > >> > >
|
|