Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Json and split into multiple files

Copy link to this message
Re: Json and split into multiple files
Mohit Anchlia 2012-09-07, 17:21
It looks like I can use outputSchema(Schema input) call to do this. But
examples I see are only for one tuple. In my case if I am reading it right
I need tuple for each dimension and hence schema for each. For instance
there'll be one user tuple and then product tuple for instance. So I need
schema for each.

How can I do this using outputSchema such that result is like below where I
can access each tuple and field that is a named field? Thanks for your help

 A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name:
chararray), product: tuple(id: int, name:chararray))

On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I have a Json something like:
> {
> user{
>  id : 1
> name: user1
>  }
> product {
> id: 1
> name: product1
> }
> }
> I want to be able to read this file and create 2 files as follows:
> user file:
> key,1,user1
> product file:
> key,1,product1
> I know I need to call exec but the method will return Bags for each of
> these dimensions.  But since it's all unordered how do I split it further
> to write them to separate files?