Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: map reduce and sync


Copy link to this message
-
Re: map reduce and sync
Lucas Bernardi 2013-02-25, 15:38
I didn't notice, thanks for the heads up.

On Mon, Feb 25, 2013 at 4:31 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Just an aside (I've not tried to look at the original issue yet), but
> Scribe has not been maintained (nor has seen a release) in over a year
> now -- looking at the commit history. Same case with both Facebook and
> Twitter's fork.
>
> On Mon, Feb 25, 2013 at 7:16 AM, Lucas Bernardi <[EMAIL PROTECTED]> wrote:
> > Yeah I looked at scribe, looks good but sounds like too much for my
> problem.
> > I'd rather make it work the simple way. Could you pleas post your code,
> may
> > be I'm doing something wrong on the sync side. Maybe a buffer size, block
> > size or some other  parameter is different...
> >
> > Thanks!
> > Lucas
> >
> >
> > On Sun, Feb 24, 2013 at 10:31 PM, Hemanth Yamijala
> > <[EMAIL PROTECTED]> wrote:
> >>
> >> I am using the same version of Hadoop as you.
> >>
> >> Can you look at something like Scribe, which AFAIK fits the use case you
> >> describe.
> >>
> >> Thanks
> >> Hemanth
> >>
> >>
> >> On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> That is exactly what I did, but in my case, it is like if the file were
> >>> empty, the job counters say no bytes read.
> >>> I'm using hadoop 1.0.3 which version did you try?
> >>>
> >>> What I'm trying to do is just some basic analyitics on a product search
> >>> system. There is a search service, every time a user performs a
> search, the
> >>> search string, and the results are stored in this file, and the file is
> >>> sync'ed. I'm actually using pig to do some basic counts, it doesn't
> work,
> >>> like I described, because the file looks empty for the map reduce
> >>> components. I thought it was about pig, but I wasn't sure, so I tried a
> >>> simple mr job, and used the word count to test the map reduce
> compoinents
> >>> actually see the sync'ed bytes.
> >>>
> >>> Of course if I close the file, everything works perfectly, but I don't
> >>> want to close the file every while, since that means I should create
> another
> >>> one (since no append support), and that would end up with too many tiny
> >>> files, something we know is bad for mr performance, and I don't want
> to add
> >>> more parts to this (like a file merging tool). I think unign sync is a
> clean
> >>> solution, since we don't care about writing performance, so I'd rather
> keep
> >>> it like this if I can make it work.
> >>>
> >>> Any idea besides hadoop version?
> >>>
> >>> Thanks!
> >>>
> >>> Lucas
> >>>
> >>>
> >>>
> >>> On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala
> >>> <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>> Hi Lucas,
> >>>>
> >>>> I tried something like this but got different results.
> >>>>
> >>>> I wrote code that opened a file on HDFS, wrote a line and called sync.
> >>>> Without closing the file, I ran a wordcount with that file as input.
> It did
> >>>> work fine and was able to count the words that were sync'ed (even
> though the
> >>>> file length seems to come as 0 like you noted in fs -ls)
> >>>>
> >>>> So, not sure what's happening in your case. In the MR job, do the job
> >>>> counters indicate no bytes were read ?
> >>>>
> >>>> On a different note though, if you can describe a little more what you
> >>>> are trying to accomplish, we could probably work a better solution.
> >>>>
> >>>> Thanks
> >>>> hemanth
> >>>>
> >>>>
> >>>> On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <[EMAIL PROTECTED]>
> >>>> wrote:
> >>>>>
> >>>>> Helo Hemanth, thanks for answering.
> >>>>> The file is open by a separate process not map reduce related at all.
> >>>>> You can think of it as a servlet, receiving requests, and writing
> them to
> >>>>> this file, every time a request is received it is written and
> >>>>> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
> >>>>>
> >>>>> At the same time, I want to run a map reduce job over this file.
> Simply
> >>>>> runing the word count example doesn't seem to work, it is like if