Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: Reading json format input


+
jamal sasha 2013-05-30, 18:43
+
Shahab Yunus 2013-05-30, 18:46
+
jamal sasha 2013-05-30, 20:57
+
jamal sasha 2013-05-29, 21:54
+
Russell Jurney 2013-05-29, 22:13
+
Michael Segel 2013-05-29, 23:30
+
jamal sasha 2013-05-29, 23:44
+
Rahul Bhattacharjee 2013-05-30, 03:12
+
Rishi Yadav 2013-05-29, 23:43
Copy link to this message
-
Re: Reading json format input
jamal sasha 2013-05-29, 23:45
Hi Rishi,
   But I dont want the wordcount of all the words..
In json, there is a field "text".. and those are the words I wish to count?
On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav <[EMAIL PROTECTED]> wrote:

> Hi Jamal,
>
> I took your input and put it in sample wordcount program and it's working
> just fine and giving this output.
>
> author 3
> foo234 1
> text 3
> foo 1
> foo123 1
> hello 3
> this 1
> world 2
>
>
> When we split using
>
> String[] words = input.split("\\W+");
>
> it takes care of all non-alphanumeric characters.
>
> Thanks and Regards,
>
> Rishi Yadav
>
> On Wed, May 29, 2013 at 2:54 PM, jamal sasha <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>    I am stuck again. :(
>> My input data is in hdfs. I am again trying to do wordcount but there is
>> slight difference.
>> The data is in json format.
>> So each line of data is:
>>
>> {"author":"foo", "text": "hello"}
>> {"author":"foo123", "text": "hello world"}
>> {"author":"foo234", "text": "hello this world"}
>>
>> So I want to do wordcount for text part.
>> I understand that in mapper, I just have to pass this data as json and
>> extract "text" and rest of the code is just the same but I am trying to
>> switch from python to java hadoop.
>> How do I do this.
>> Thanks
>>
>
>
+
Rishi Yadav 2013-05-30, 00:15