Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Reading json data


Copy link to this message
-
Re: Reading json data
Ryan Compton 2013-10-23, 02:01
It sounds like you have two problems: parsing json and joining the datasets

For reading jsons you can use:
http://stackoverflow.com/questions/11035105/processing-json-through-pig-scripts/16501542#16501542

For matching the types you could filter for type1 then join against
the data_dict_1 and then do it once more for type2.

To get the final output use JsonStorage()
On Tue, Oct 22, 2013 at 1:31 PM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  I have three data types...
>
> 1) Base data
> 2) data_dict_1
> 3) data_dict_2
>
> Base data is very well formatted json..
> For example:
> {"id1":"foo", "id2":"bar" ,type:"type1"}
> {"id1":"foo", "id2":"bar" ,type:"type2"}
>
> data_dict_1
> 1 foo
> 2 bar
> 3 foobar
> ....
>
> data_dict_2
> -1 foo
> -2 bar
> -3 foobar
> ... and so on
>
>
> Now, what I want is.. if the data is of type1
>
> Then read id1 from data_dict_1, id2 from data_dict2 and assign that integer
> id..
> If data is of type2.. then read id1 from data_dict_2.. id2 from
> data_dict1.. and assign corresponding ids..
> For example:
>
> {"id1":1, "id2":2 ,type:"type1"}
> {"id1":-1, "id2":-2 ,type:"type2"}
>
> And so on..
> How do i do this in pig?