Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - A way to monitor HDFS for a file to come live, and then kick off a job?


+
Jonathan Coveney 2011-03-24, 17:09
+
David Rosenstrauch 2011-03-24, 17:16
+
Bai, Gang 2011-03-25, 14:25
+
Mapred Learn 2011-03-25, 16:28
+
Allen Wittenauer 2011-03-25, 16:46
Copy link to this message
-
Re: A way to monitor HDFS for a file to come live, and then kick off a job?
Lance Norskog 2011-05-15, 22:09
Hamake does exactly this:

http://code.google.com/p/hamake/

On Fri, Mar 25, 2011 at 9:46 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:
>
> On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
>
>> I am not sure if this is the right listserv, forgive me if it is not.
>
>        A better choice would likely be hdfs-user@, since this is really about watching files in HDFS.
>
>
>> My
>> goal is this: monitor HDFS until a file is create, and then kick off a job.
>> Ideally I'd want to do this continuously, but the file would be create
>> hourly (with some sort of variance). I guess I could make a script that
>> would ping the server every 5 minutes or something, but I was wondering if
>> there might be a more elegant way?
>
>        Two ways off the top of my head:
>
>        1) Read/watch the edits stream
>
>        2) Read/watch the HDFS audit log
>
>        Given the latter is text built by log4j, that should be relatively simple to implement.
>
> There was a JIRA asking for this functionally to be built in recently, btw.

--
Lance Norskog
[EMAIL PROTECTED]