|
|
-
Re: map reduce and syncLucas Bernardi 2013-02-25, 15:38
I didn't notice, thanks for the heads up.
On Mon, Feb 25, 2013 at 4:31 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Just an aside (I've not tried to look at the original issue yet), but > Scribe has not been maintained (nor has seen a release) in over a year > now -- looking at the commit history. Same case with both Facebook and > Twitter's fork. > > On Mon, Feb 25, 2013 at 7:16 AM, Lucas Bernardi <[EMAIL PROTECTED]> wrote: > > Yeah I looked at scribe, looks good but sounds like too much for my > problem. > > I'd rather make it work the simple way. Could you pleas post your code, > may > > be I'm doing something wrong on the sync side. Maybe a buffer size, block > > size or some other parameter is different... > > > > Thanks! > > Lucas > > > > > > On Sun, Feb 24, 2013 at 10:31 PM, Hemanth Yamijala > > <[EMAIL PROTECTED]> wrote: > >> > >> I am using the same version of Hadoop as you. > >> > >> Can you look at something like Scribe, which AFAIK fits the use case you > >> describe. > >> > >> Thanks > >> Hemanth > >> > >> > >> On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi <[EMAIL PROTECTED]> > wrote: > >>> > >>> That is exactly what I did, but in my case, it is like if the file were > >>> empty, the job counters say no bytes read. > >>> I'm using hadoop 1.0.3 which version did you try? > >>> > >>> What I'm trying to do is just some basic analyitics on a product search > >>> system. There is a search service, every time a user performs a > search, the > >>> search string, and the results are stored in this file, and the file is > >>> sync'ed. I'm actually using pig to do some basic counts, it doesn't > work, > >>> like I described, because the file looks empty for the map reduce > >>> components. I thought it was about pig, but I wasn't sure, so I tried a > >>> simple mr job, and used the word count to test the map reduce > compoinents > >>> actually see the sync'ed bytes. > >>> > >>> Of course if I close the file, everything works perfectly, but I don't > >>> want to close the file every while, since that means I should create > another > >>> one (since no append support), and that would end up with too many tiny > >>> files, something we know is bad for mr performance, and I don't want > to add > >>> more parts to this (like a file merging tool). I think unign sync is a > clean > >>> solution, since we don't care about writing performance, so I'd rather > keep > >>> it like this if I can make it work. > >>> > >>> Any idea besides hadoop version? > >>> > >>> Thanks! > >>> > >>> Lucas > >>> > >>> > >>> > >>> On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala > >>> <[EMAIL PROTECTED]> wrote: > >>>> > >>>> Hi Lucas, > >>>> > >>>> I tried something like this but got different results. > >>>> > >>>> I wrote code that opened a file on HDFS, wrote a line and called sync. > >>>> Without closing the file, I ran a wordcount with that file as input. > It did > >>>> work fine and was able to count the words that were sync'ed (even > though the > >>>> file length seems to come as 0 like you noted in fs -ls) > >>>> > >>>> So, not sure what's happening in your case. In the MR job, do the job > >>>> counters indicate no bytes were read ? > >>>> > >>>> On a different note though, if you can describe a little more what you > >>>> are trying to accomplish, we could probably work a better solution. > >>>> > >>>> Thanks > >>>> hemanth > >>>> > >>>> > >>>> On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <[EMAIL PROTECTED]> > >>>> wrote: > >>>>> > >>>>> Helo Hemanth, thanks for answering. > >>>>> The file is open by a separate process not map reduce related at all. > >>>>> You can think of it as a servlet, receiving requests, and writing > them to > >>>>> this file, every time a request is received it is written and > >>>>> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked. > >>>>> > >>>>> At the same time, I want to run a map reduce job over this file. > Simply > >>>>> runing the word count example doesn't seem to work, it is like if |