Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> map reduce and sync

Copy link to this message
Re: map reduce and sync
I am using the same version of Hadoop as you.

Can you look at something like Scribe, which AFAIK fits the use case you

On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi <[EMAIL PROTECTED]> wrote:

> That is exactly what I did, but in my case, it is like if the file were
> empty, the job counters say no bytes read.
> I'm using hadoop 1.0.3 which version did you try?
> What I'm trying to do is just some basic analyitics on a product search
> system. There is a search service, every time a user performs a search, the
> search string, and the results are stored in this file, and the file is
> sync'ed. I'm actually using pig to do some basic counts, it doesn't work,
> like I described, because the file looks empty for the map reduce
> components. I thought it was about pig, but I wasn't sure, so I tried a
> simple mr job, and used the word count to test the map reduce compoinents
> actually see the sync'ed bytes.
> Of course if I close the file, everything works perfectly, but I don't
> want to close the file every while, since that means I should create
> another one (since no append support), and that would end up with too many
> tiny files, something we know is bad for mr performance, and I don't want
> to add more parts to this (like a file merging tool). I think unign sync is
> a clean solution, since we don't care about writing performance, so I'd
> rather keep it like this if I can make it work.
> Any idea besides hadoop version?
> Thanks!
> Lucas
> On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala <
>> Hi Lucas,
>> I tried something like this but got different results.
>> I wrote code that opened a file on HDFS, wrote a line and called sync.
>> Without closing the file, I ran a wordcount with that file as input. It did
>> work fine and was able to count the words that were sync'ed (even though
>> the file length seems to come as 0 like you noted in fs -ls)
>> So, not sure what's happening in your case. In the MR job, do the job
>> counters indicate no bytes were read ?
>> On a different note though, if you can describe a little more what you
>> are trying to accomplish, we could probably work a better solution.
>> Thanks
>> hemanth
>> On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <[EMAIL PROTECTED]> wrote:
>>> Helo Hemanth, thanks for answering.
>>> The file is open by a separate process not map reduce related at all.
>>> You can think of it as a servlet, receiving requests, and writing them to
>>> this file, every time a request is received it is written and
>>> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
>>> At the same time, I want to run a map reduce job over this file. Simply
>>> runing the word count example doesn't seem to work, it is like if the file
>>> were empty.
>>> hadoop -fs -tail works just fine, and reading the file using
>>> org.apache.hadoop.fs.FSDataInputStream also works ok.
>>> Last thing, the web interface doesn't see the contents, and command
>>> hadoop -fs -ls says the file is empty.
>>> What am I doing wrong?
>>> Thanks!
>>> Lucas
>>> On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala <
>>> [EMAIL PROTECTED]> wrote:
>>>> Could you please clarify, are you opening the file in your mapper code
>>>> and reading from there ?
>>>> Thanks
>>>> Hemanth
>>>> On Friday, February 22, 2013, Lucas Bernardi wrote:
>>>>> Hello there, I'm trying to use hadoop map reduce to process an open
>>>>> file. The writing process, writes a line to the file and syncs the
>>>>> file to readers.
>>>>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>>>>> If I try to read the file from another process, it works fine, at
>>>>> least using
>>>>> org.apache.hadoop.fs.FSDataInputStream.
>>>>> hadoop -fs -tail also works just fine
>>>>> But it looks like map reduce doesn't read any data. I tried using the
>>>>> word count example, same thing, it is like if the file were empty for the