Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> string conversion problems


Copy link to this message
-
Re: string conversion problems
Whitespace characters are funny. You showed me this code in the mapper:

String [] tokens = line.split("    ");

Which doesn't actually match for tab, which would be line.split("\t");

This would still execute, but you'd have keys and values that look right
going into the reducer, but you might not catch that you have value
substrings appended to the key because you didn't split correctly.

This is just from eyeballing the code. Let me know if I'm on the right
track.

Jeff
On Fri, Jul 16, 2010 at 10:16 AM, Nikolay Korovaiko <[EMAIL PROTECTED]>wrote:

> First, thank you very much for the reply!
>
> so, this is my input:
>
> a\tb
> b\tc
> c\ta
>
> In other words, a map function initially receives the whole string a\tb as
> its value.
> And it processes my input data correctly. I actually changed my reduce
> function to simply emit merged pairs from a map's input for checking this.
> However, when I tried to cross join cases where I have both to_'s and
> from_'s (for example, a reducer gets the following pair <a, to_b ; from_c>
> )
> by splitting each value provided by a reducer's iterator with split("_"),
> it
> just didn't work. Even though without this additional logic reducer DOES
> output these values <a, to_b ; from_c>, so it GETS them. The same split
> thing works just fine for keys in a reduce function i.e. it discriminates
> cases with a composite key like "a_b" from a simple key like "a." My guess
> is that Hadoop should be sorting values for a reducer behind the scene and
> this somehow messes up an initial character encoding. I'm using a Text
> class
> as a serializable wrapper for my strings. I guess there is no other option
> for it?)))
>
> I wanna try to get rid of composite keys first (the last output.collect in
> a
> map function) to make things a bit simpler and test it again then.
>
>
> On Fri, Jul 16, 2010 at 9:16 AM, Jeff Bean <[EMAIL PROTECTED]> wrote:
>
> > Is the tab the delimiter between records or between keys and values on
> the
> > input?
> >
> > in other words does the input file look like this:
> >
> > a\tb
> > b\tc
> > c\ta
> >
> > or does it look like this:
> >
> > a   b\tb   c\tc   a\t
> >
> > ?
> >
> > Jeff
> >
> > On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi everyone,
> > >
> > > I hope this is the right place for my question. If not, please, feel
> free
> > > to
> > > ignore it  ;) and I'm sorry for any inconvenience made :(
> > >
> > > I'm writing a simple program for enumerating triangles in directed
> graphs
> > > for my project. First, for each input arc (e.g. a b, b c, c a, note: a
> > tab
> > > symbol serves as a delimiter) I want my map function output the
> following
> > > pairs ([a, to_b], [b, from_a], [a_b, -1]):
> > >
> > >  public void map(LongWritable key, Text value,
> > >
> > >                OutputCollector<Text, Text> output,
> > >
> > >                Reporter reporter) throws IOException {
> > >
> > >  String line = value.toString();
> > >
> > >  String [] tokens = line.split("    ");
> > >
> > >  output.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));
> > >
> > >  output.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));
> > >
> > >  output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));
> > >
> > > }
> > >
> > > Now my reduce function is supposed to cross join all pairs that have
> both
> > > to_'s and from_'s and to simply propogate any other pairs whose keys
> > > contain
> > > "_".
> > >
> > >      public void reduce(Text key, Iterator<Text> values,
> > >
> > >                   OutputCollector<Text, Text> output,
> > >
> > >                   Reporter reporter) throws IOException {
> > >
> > >  String key_s = key.toString();
> > >
> > >  if (key_s.indexOf("_")>0)
> > >
> > >      output.collect(key, new Text("completed"));
> > >
> > >   else {
> > >
> > >           HashMap <String, ArrayList<String>> lists = new HashMap
> > > <String, ArrayList<String>> ();
> > >
> > >          while (values.hasNext()) {
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB