Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Json and split into multiple files


+
Mohit Anchlia 2012-09-05, 03:37
+
Mohit Anchlia 2012-09-05, 19:04
+
Alan Gates 2012-09-06, 15:21
+
Mohit Anchlia 2012-09-06, 16:32
+
Mohit Anchlia 2012-09-07, 17:21
Copy link to this message
-
Re: Json and split into multiple files
I don't understand your use case or why you need to use exec or outputSchema.  Would it be possible to send a more complete example that makes clear why you need these?

Alan.

A tuple can contain a tuple, so it's certainly possible with outputSchema() to generate a schema that declares both your tuples.  But I don't think this answers your questions.

On Sep 7, 2012, at 10:21 AM, Mohit Anchlia wrote:

> It looks like I can use outputSchema(Schema input) call to do this. But
> examples I see are only for one tuple. In my case if I am reading it right
> I need tuple for each dimension and hence schema for each. For instance
> there'll be one user tuple and then product tuple for instance. So I need
> schema for each.
>
> How can I do this using outputSchema such that result is like below where I
> can access each tuple and field that is a named field? Thanks for your help
>
> A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name:
> chararray), product: tuple(id: int, name:chararray))
>
> On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> I have a Json something like:
>>
>> {
>> user{
>> id : 1
>> name: user1
>> }
>> product {
>> id: 1
>> name: product1
>> }
>> }
>>
>> I want to be able to read this file and create 2 files as follows:
>>
>> user file:
>> key,1,user1
>>
>> product file:
>> key,1,product1
>>
>> I know I need to call exec but the method will return Bags for each of
>> these dimensions.  But since it's all unordered how do I split it further
>> to write them to separate files?
>>
+
Mohit Anchlia 2012-09-13, 14:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB