Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Json and split into multiple files


+
Mohit Anchlia 2012-09-05, 03:37
+
Mohit Anchlia 2012-09-05, 19:04
+
Alan Gates 2012-09-06, 15:21
+
Mohit Anchlia 2012-09-06, 16:32
+
Mohit Anchlia 2012-09-07, 17:21
+
Alan Gates 2012-09-13, 02:51
Copy link to this message
-
Re: Json and split into multiple files
On Wed, Sep 12, 2012 at 7:51 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> I don't understand your use case or why you need to use exec or
> outputSchema.  Would it be possible to send a more complete example that
> makes clear why you need these?
>
> My Json has many fields and several parent elements. I already have POJO
that I can parse into and read fields from instead of hand typing all of
them. I also have a mapper and formatter that maps JSON to database fields
which is a fixed position in file. Hand typing all of it in Pig would be
really painful. With exec I can easily parse my Json and then use Mappers
to write to Tuples. It's faster to develop and easy to unit test.
> Alan.
>
> A tuple can contain a tuple, so it's certainly possible with
> outputSchema() to generate a schema that declares both your tuples.  But I
> don't think this answers your questions.
>
> On Sep 7, 2012, at 10:21 AM, Mohit Anchlia wrote:
>
> > It looks like I can use outputSchema(Schema input) call to do this. But
> > examples I see are only for one tuple. In my case if I am reading it
> right
> > I need tuple for each dimension and hence schema for each. For instance
> > there'll be one user tuple and then product tuple for instance. So I need
> > schema for each.
> >
> > How can I do this using outputSchema such that result is like below
> where I
> > can access each tuple and field that is a named field? Thanks for your
> help
> >
> > A = load 'inputfile' using JsonLoader() as (user: tuple(id: int, name:
> > chararray), product: tuple(id: int, name:chararray))
> >
> > On Tue, Sep 4, 2012 at 8:37 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
> >
> >> I have a Json something like:
> >>
> >> {
> >> user{
> >> id : 1
> >> name: user1
> >> }
> >> product {
> >> id: 1
> >> name: product1
> >> }
> >> }
> >>
> >> I want to be able to read this file and create 2 files as follows:
> >>
> >> user file:
> >> key,1,user1
> >>
> >> product file:
> >> key,1,product1
> >>
> >> I know I need to call exec but the method will return Bags for each of
> >> these dimensions.  But since it's all unordered how do I split it
> further
> >> to write them to separate files?
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB