Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Json and split into multiple files


Copy link to this message
-
Re: Json and split into multiple files
I don't understand your use case or why you need to use exec or outputSchema.  Would it be possible to send a more complete example that makes clear why you need these?

Alan.

A tuple can contain a tuple, so it's certainly possible with outputSchema() to generate a schema that declares both your tuples.  But I don't think this answers your questions.

On Sep 7, 2012, at 10:21 AM, Mohit Anchlia wrote:

> It looks like I can use outputSchema(Schema input) call to do this. But
> examples I see are only for one tuple. In my case if I am reading it right
> I need tuple for each dimension and hence schema for each. For instance
> there'll be one user tuple and then product tuple for instance. So I need
> schema for each.
>
> How can I do this using outputSchema such that result is like below where I
> can access each tuple and field that is a named field? Thanks for your help
>
> A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name:
> chararray), product: tuple(id: int, name:chararray))
>
> On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> I have a Json something like:
>>
>> {
>> user{
>> id : 1
>> name: user1
>> }
>> product {
>> id: 1
>> name: product1
>> }
>> }
>>
>> I want to be able to read this file and create 2 files as follows:
>>
>> user file:
>> key,1,user1
>>
>> product file:
>> key,1,product1
>>
>> I know I need to call exec but the method will return Bags for each of
>> these dimensions.  But since it's all unordered how do I split it further
>> to write them to separate files?
>>