Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> string conversion problems


+
Nikolay Korovaiko 2010-07-16, 01:18
Copy link to this message
-
Re: string conversion problems
Is the tab the delimiter between records or between keys and values on the
input?

in other words does the input file look like this:

a\tb
b\tc
c\ta

or does it look like this:

a   b\tb   c\tc   a\t

?

Jeff

On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko <[EMAIL PROTECTED]>wrote:

> Hi everyone,
>
> I hope this is the right place for my question. If not, please, feel free
> to
> ignore it  ;) and I'm sorry for any inconvenience made :(
>
> I'm writing a simple program for enumerating triangles in directed graphs
> for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab
> symbol serves as a delimiter) I want my map function output the following
> pairs ([a, to_b], [b, from_a], [a_b, -1]):
>
>  public void map(LongWritable key, Text value,
>
>                OutputCollector<Text, Text> output,
>
>                Reporter reporter) throws IOException {
>
>  String line = value.toString();
>
>  String [] tokens = line.split("    ");
>
>  output.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));
>
>  output.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));
>
>  output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));
>
> }
>
> Now my reduce function is supposed to cross join all pairs that have both
> to_'s and from_'s and to simply propogate any other pairs whose keys
> contain
> "_".
>
>      public void reduce(Text key, Iterator<Text> values,
>
>                   OutputCollector<Text, Text> output,
>
>                   Reporter reporter) throws IOException {
>
>  String key_s = key.toString();
>
>  if (key_s.indexOf("_")>0)
>
>      output.collect(key, new Text("completed"));
>
>   else {
>
>           HashMap <String, ArrayList<String>> lists = new HashMap
> <String, ArrayList<String>> ();
>
>          while (values.hasNext()) {
>
>              String line = values.next().toString();
>
>              String[] tokens = line.split("_");
>
>              if (!lists.containsKey(tokens[0])) {
>
>                   lists.put(tokens[0], new ArrayList<String>());
>
>              }
>           lists.get(tokens[0]).add(tokens[1]);
>
>          }
>
>          for (String t : lists.get("to"))
>
>               for (String f : lists.get("from"))
>
>                  output.collect(new Text(t+"_"+f), key);
>
>
>  }
>
> }
>
> And this is where the most exciting stuff happens. tokens[1] yields an
> ArrayOutOfBounds exception. If you scroll up, you can see that by this
> point
> the iterator should give values like "to_a", "from_b", "to_b", etc... when
> I
> just output these values, everything looks ok and I have "to_a", "from_b".
> But split() don't work at all, moreover line.length() is always 1 and
> indexOf("*") returns -1! The very same indexOf WORKS PERFECTLY for keys...
> where we have pairs whose keys contain "_"* and look like "a_b", "b_c"
>
> I'm really puzzled with all this. MapReduce is supposed to save lives
> making
> everything simple. Instead I spent several hours to just spot  this...
>
> I'd really appreciate your help, guys!!! Thanks in advance!
>
+
Nikolay Korovaiko 2010-07-16, 17:16
+
Jeff Bean 2010-07-16, 20:33
+
cvkkumar 2010-07-17, 05:57
+
Nikolay Korovaiko 2010-07-17, 06:26
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB