Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Json and split into multiple files

Copy link to this message
Re: Json and split into multiple files
I don't understand your use case or why you need to use exec or outputSchema.  Would it be possible to send a more complete example that makes clear why you need these?


A tuple can contain a tuple, so it's certainly possible with outputSchema() to generate a schema that declares both your tuples.  But I don't think this answers your questions.

On Sep 7, 2012, at 10:21 AM, Mohit Anchlia wrote:

> It looks like I can use outputSchema(Schema input) call to do this. But
> examples I see are only for one tuple. In my case if I am reading it right
> I need tuple for each dimension and hence schema for each. For instance
> there'll be one user tuple and then product tuple for instance. So I need
> schema for each.
> How can I do this using outputSchema such that result is like below where I
> can access each tuple and field that is a named field? Thanks for your help
> A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name:
> chararray), product: tuple(id: int, name:chararray))
> On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>> I have a Json something like:
>> {
>> user{
>> id : 1
>> name: user1
>> }
>> product {
>> id: 1
>> name: product1
>> }
>> }
>> I want to be able to read this file and create 2 files as follows:
>> user file:
>> key,1,user1
>> product file:
>> key,1,product1
>> I know I need to call exec but the method will return Bags for each of
>> these dimensions.  But since it's all unordered how do I split it further
>> to write them to separate files?