Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Reading json format input


Copy link to this message
-
Re: Reading json format input
Hi Rishi,
   But I dont want the wordcount of all the words..
In json, there is a field "text".. and those are the words I wish to count?
On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav <[EMAIL PROTECTED]> wrote:

> Hi Jamal,
>
> I took your input and put it in sample wordcount program and it's working
> just fine and giving this output.
>
> author 3
> foo234 1
> text 3
> foo 1
> foo123 1
> hello 3
> this 1
> world 2
>
>
> When we split using
>
> String[] words = input.split("\\W+");
>
> it takes care of all non-alphanumeric characters.
>
> Thanks and Regards,
>
> Rishi Yadav
>
> On Wed, May 29, 2013 at 2:54 PM, jamal sasha <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>    I am stuck again. :(
>> My input data is in hdfs. I am again trying to do wordcount but there is
>> slight difference.
>> The data is in json format.
>> So each line of data is:
>>
>> {"author":"foo", "text": "hello"}
>> {"author":"foo123", "text": "hello world"}
>> {"author":"foo234", "text": "hello this world"}
>>
>> So I want to do wordcount for text part.
>> I understand that in mapper, I just have to pass this data as json and
>> extract "text" and rest of the code is just the same but I am trying to
>> switch from python to java hadoop.
>> How do I do this.
>> Thanks
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB