Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Json and split into multiple files


Copy link to this message
-
Re: Json and split into multiple files
Mohit Anchlia 2012-09-06, 16:32
My real life Json is much more complicated and I will have to use exec
method. But I was wondering how do I reference a Bag related to user and
all it's fields when it gets returned from the exec call?

On Thu, Sep 6, 2012 at 8:21 AM, Alan Gates <[EMAIL PROTECTED]> wrote:

> Loading the JSON below should give you a Pig record like:
> (user: tuple(id: int, name: chararray), product: tuple(id: int,
> name:chararray))
>
> In that case your Pig Latin would look like:
>
> A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name:
> chararray), product: tuple(id: int, name:chararray))
> B = foreach A generate user.id, user.name;
> store B into 'userfile';
> C = foreach A generate product.id, product.name;
> store C info 'productfile';
>
> I'm not sure what key is, so I'm not sure the above is what you're
> thinking or not.
>
> Alan.
>
> On Sep 5, 2012, at 12:04 PM, Mohit Anchlia wrote:
>
> > Any pointers would be appreciated
> >
> > On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
> >
> >> I have a Json something like:
> >>
> >> {
> >> user{
> >> id : 1
> >> name: user1
> >> }
> >> product {
> >> id: 1
> >> name: product1
> >> }
> >> }
> >>
> >> I want to be able to read this file and create 2 files as follows:
> >>
> >> user file:
> >> key,1,user1
> >>
> >> product file:
> >> key,1,product1
> >>
> >> I know I need to call exec but the method will return Bags for each of
> >> these dimensions.  But since it's all unordered how do I split it
> further
> >> to write them to separate files?
> >>
>
>