Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Reading json format input


Copy link to this message
-
Re: Reading json format input
Hi Jamal,

I took your input and put it in sample wordcount program and it's working
just fine and giving this output.

author 3
foo234 1
text 3
foo 1
foo123 1
hello 3
this 1
world 2
When we split using

String[] words = input.split("\\W+");

it takes care of all non-alphanumeric characters.

Thanks and Regards,

Rishi Yadav

On Wed, May 29, 2013 at 2:54 PM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Hi,
>    I am stuck again. :(
> My input data is in hdfs. I am again trying to do wordcount but there is
> slight difference.
> The data is in json format.
> So each line of data is:
>
> {"author":"foo", "text": "hello"}
> {"author":"foo123", "text": "hello world"}
> {"author":"foo234", "text": "hello this world"}
>
> So I want to do wordcount for text part.
> I understand that in mapper, I just have to pass this data as json and
> extract "text" and rest of the code is just the same but I am trying to
> switch from python to java hadoop.
> How do I do this.
> Thanks
>