Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Reading json format input


Copy link to this message
-
Re: Reading json format input
Hi Thanks guys.
 I figured out the issue. Hence i have another question.
I am using a third party library and I thought that once I have created the
jar file I dont need to specify the dependancies but aparently thats not
the case. (error below)
Very very naive question...probably stupid. How do i specify third party
libraries (jar) in hadoop.

Error:
Error: java.lang.ClassNotFoundException: org.json.JSONException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
at
org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

On Thu, May 30, 2013 at 2:02 AM, Pramod N <[EMAIL PROTECTED]> wrote:

> Whatever you are trying to do should work,
> Here is the modified WordCount Map
>
>
>     public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {        String line = value.toString();
>
>         JSONObject line_as_json = new JSONObject(line);
>         String text = line_as_json.getString("text");
>         StringTokenizer tokenizer = new StringTokenizer(text);        while (tokenizer.hasMoreTokens()) {            word.set(tokenizer.nextToken());            context.write(word, one);        }    }
>
>
>
>
>
> Pramod N <http://atmachinelearner.blogspot.in>
> Bruce Wayne of web
> @machinelearner <https://twitter.com/machinelearner>
>
> --
>
>
> On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee <
> [EMAIL PROTECTED]> wrote:
>
>> Whatever you have mentioned Jamal should work.you can debug this.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Thu, May 30, 2013 at 5:14 AM, jamal sasha <[EMAIL PROTECTED]>wrote:
>>
>>> Hi,
>>>   For some reason, this have to be in java :(
>>> I am trying to use org.json library, something like (in mapper)
>>> JSONObject jsn = new JSONObject(value.toString());
>>>
>>> String text = (String) jsn.get("text");
>>> StringTokenizer itr = new StringTokenizer(text);
>>>
>>> But its not working :(
>>> It would be better to get this thing properly but I wouldnt mind using a
>>> hack as well :)
>>>
>>>
>>> On Wed, May 29, 2013 at 4:30 PM, Michael Segel <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Yeah,
>>>> I have to agree w Russell. Pig is definitely the way to go on this.
>>>>
>>>> If you want to do it as a Java program you will have to do some work on
>>>> the input string but it too should be trivial.
>>>> How formal do you want to go?
>>>> Do you want to strip it down or just find the quote after the text
>>>> part?
>>>>
>>>>
>>>> On May 29, 2013, at 5:13 PM, Russell Jurney <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>> Seriously consider Pig (free answer, 4 LOC):
>>>>
>>>> my_data = LOAD 'my_data.json' USING
>>>> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[];
>>>> words = FOREACH my_data GENERATE $0#'author' as author,
>>>> FLATTEN(TOKENIZE($0#'text')) as word;
>>>> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word,
>>>> COUNT_STAR(words) AS word_count;
>>>> STORE word_counts INTO '/tmp/word_counts.txt';
>>>>
>>>> It will be faster than the Java you'll likely write.
+
Shahab Yunus 2013-05-30, 18:46
+
jamal sasha 2013-05-30, 20:57
+
jamal sasha 2013-05-29, 21:54
+
Russell Jurney 2013-05-29, 22:13
+
Michael Segel 2013-05-29, 23:30
+
jamal sasha 2013-05-29, 23:44
+
Rahul Bhattacharjee 2013-05-30, 03:12
+
Rishi Yadav 2013-05-29, 23:43
+
jamal sasha 2013-05-29, 23:45
+
Rishi Yadav 2013-05-30, 00:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB