Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> string conversion problems


Copy link to this message
-
Re: string conversion problems
Is the tab the delimiter between records or between keys and values on the
input?

in other words does the input file look like this:

a\tb
b\tc
c\ta

or does it look like this:

a   b\tb   c\tc   a\t

?

Jeff

On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko <[EMAIL PROTECTED]>wrote:

> Hi everyone,
>
> I hope this is the right place for my question. If not, please, feel free
> to
> ignore it  ;) and I'm sorry for any inconvenience made :(
>
> I'm writing a simple program for enumerating triangles in directed graphs
> for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab
> symbol serves as a delimiter) I want my map function output the following
> pairs ([a, to_b], [b, from_a], [a_b, -1]):
>
>  public void map(LongWritable key, Text value,
>
>                OutputCollector<Text, Text> output,
>
>                Reporter reporter) throws IOException {
>
>  String line = value.toString();
>
>  String [] tokens = line.split("    ");
>
>  output.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));
>
>  output.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));
>
>  output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));
>
> }
>
> Now my reduce function is supposed to cross join all pairs that have both
> to_'s and from_'s and to simply propogate any other pairs whose keys
> contain
> "_".
>
>      public void reduce(Text key, Iterator<Text> values,
>
>                   OutputCollector<Text, Text> output,
>
>                   Reporter reporter) throws IOException {
>
>  String key_s = key.toString();
>
>  if (key_s.indexOf("_")>0)
>
>      output.collect(key, new Text("completed"));
>
>   else {
>
>           HashMap <String, ArrayList<String>> lists = new HashMap
> <String, ArrayList<String>> ();
>
>          while (values.hasNext()) {
>
>              String line = values.next().toString();
>
>              String[] tokens = line.split("_");
>
>              if (!lists.containsKey(tokens[0])) {
>
>                   lists.put(tokens[0], new ArrayList<String>());
>
>              }
>           lists.get(tokens[0]).add(tokens[1]);
>
>          }
>
>          for (String t : lists.get("to"))
>
>               for (String f : lists.get("from"))
>
>                  output.collect(new Text(t+"_"+f), key);
>
>
>  }
>
> }
>
> And this is where the most exciting stuff happens. tokens[1] yields an
> ArrayOutOfBounds exception. If you scroll up, you can see that by this
> point
> the iterator should give values like "to_a", "from_b", "to_b", etc... when
> I
> just output these values, everything looks ok and I have "to_a", "from_b".
> But split() don't work at all, moreover line.length() is always 1 and
> indexOf("*") returns -1! The very same indexOf WORKS PERFECTLY for keys...
> where we have pairs whose keys contain "_"* and look like "a_b", "b_c"
>
> I'm really puzzled with all this. MapReduce is supposed to save lives
> making
> everything simple. Instead I spent several hours to just spot  this...
>
> I'd really appreciate your help, guys!!! Thanks in advance!
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB