Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Reading json data


Copy link to this message
-
Re: Reading json data
It sounds like you have two problems: parsing json and joining the datasets

For reading jsons you can use:
http://stackoverflow.com/questions/11035105/processing-json-through-pig-scripts/16501542#16501542

For matching the types you could filter for type1 then join against
the data_dict_1 and then do it once more for type2.

To get the final output use JsonStorage()
On Tue, Oct 22, 2013 at 1:31 PM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  I have three data types...
>
> 1) Base data
> 2) data_dict_1
> 3) data_dict_2
>
> Base data is very well formatted json..
> For example:
> {"id1":"foo", "id2":"bar" ,type:"type1"}
> {"id1":"foo", "id2":"bar" ,type:"type2"}
>
> data_dict_1
> 1 foo
> 2 bar
> 3 foobar
> ....
>
> data_dict_2
> -1 foo
> -2 bar
> -3 foobar
> ... and so on
>
>
> Now, what I want is.. if the data is of type1
>
> Then read id1 from data_dict_1, id2 from data_dict2 and assign that integer
> id..
> If data is of type2.. then read id1 from data_dict_2.. id2 from
> data_dict1.. and assign corresponding ids..
> For example:
>
> {"id1":1, "id2":2 ,type:"type1"}
> {"id1":-1, "id2":-2 ,type:"type2"}
>
> And so on..
> How do i do this in pig?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB