Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - string conversion problems


Copy link to this message
-
Re: string conversion problems
Jeff Bean 2010-07-16, 16:16
Is the tab the delimiter between records or between keys and values on the
input?

in other words does the input file look like this:

a\tb
b\tc
c\ta

or does it look like this:

a   b\tb   c\tc   a\t

?

Jeff

On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko <[EMAIL PROTECTED]>wrote:

> Hi everyone,
>
> I hope this is the right place for my question. If not, please, feel free
> to
> ignore it  ;) and I'm sorry for any inconvenience made :(
>
> I'm writing a simple program for enumerating triangles in directed graphs
> for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab
> symbol serves as a delimiter) I want my map function output the following
> pairs ([a, to_b], [b, from_a], [a_b, -1]):
>
>  public void map(LongWritable key, Text value,
>
>                OutputCollector<Text, Text> output,
>
>                Reporter reporter) throws IOException {
>
>  String line = value.toString();
>
>  String [] tokens = line.split("    ");
>
>  output.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));
>
>  output.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));
>
>  output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));
>
> }
>
> Now my reduce function is supposed to cross join all pairs that have both
> to_'s and from_'s and to simply propogate any other pairs whose keys
> contain
> "_".
>
>      public void reduce(Text key, Iterator<Text> values,
>
>                   OutputCollector<Text, Text> output,
>
>                   Reporter reporter) throws IOException {
>
>  String key_s = key.toString();
>
>  if (key_s.indexOf("_")>0)
>
>      output.collect(key, new Text("completed"));
>
>   else {
>
>           HashMap <String, ArrayList<String>> lists = new HashMap
> <String, ArrayList<String>> ();
>
>          while (values.hasNext()) {
>
>              String line = values.next().toString();
>
>              String[] tokens = line.split("_");
>
>              if (!lists.containsKey(tokens[0])) {
>
>                   lists.put(tokens[0], new ArrayList<String>());
>
>              }
>           lists.get(tokens[0]).add(tokens[1]);
>
>          }
>
>          for (String t : lists.get("to"))
>
>               for (String f : lists.get("from"))
>
>                  output.collect(new Text(t+"_"+f), key);
>
>
>  }
>
> }
>
> And this is where the most exciting stuff happens. tokens[1] yields an
> ArrayOutOfBounds exception. If you scroll up, you can see that by this
> point
> the iterator should give values like "to_a", "from_b", "to_b", etc... when
> I
> just output these values, everything looks ok and I have "to_a", "from_b".
> But split() don't work at all, moreover line.length() is always 1 and
> indexOf("*") returns -1! The very same indexOf WORKS PERFECTLY for keys...
> where we have pairs whose keys contain "_"* and look like "a_b", "b_c"
>
> I'm really puzzled with all this. MapReduce is supposed to save lives
> making
> everything simple. Instead I spent several hours to just spot  this...
>
> I'd really appreciate your help, guys!!! Thanks in advance!
>