|
|
-
A way to monitor HDFS for a file to come live, and then kick off a job?
Jonathan Coveney 2011-03-24, 17:09
I am not sure if this is the right listserv, forgive me if it is not. My goal is this: monitor HDFS until a file is create, and then kick off a job. Ideally I'd want to do this continuously, but the file would be create hourly (with some sort of variance). I guess I could make a script that would ping the server every 5 minutes or something, but I was wondering if there might be a more elegant way?
Thanks Jon
+
Jonathan Coveney 2011-03-24, 17:09
-
Re: A way to monitor HDFS for a file to come live, and then kick off a job?
David Rosenstrauch 2011-03-24, 17:16
On 03/24/2011 01:09 PM, Jonathan Coveney wrote: > I am not sure if this is the right listserv, forgive me if it is not. My > goal is this: monitor HDFS until a file is create, and then kick off a job. > Ideally I'd want to do this continuously, but the file would be create > hourly (with some sort of variance). I guess I could make a script that > would ping the server every 5 minutes or something, but I was wondering if > there might be a more elegant way? > > Thanks > Jon
I suppose you could do this using HDFS, but this sounds to me like Zookeeper is much better suited to this type of application. You could just add watcher onto a particular zookeeper node and you'd get notified about updates to it and its children.
HTH,
DR
+
David Rosenstrauch 2011-03-24, 17:16
-
Re: A way to monitor HDFS for a file to come live, and then kick off a job?
Bai, Gang 2011-03-25, 14:25
Hi Jon,
Oozie could handle this nicely. You may just specify a Oozie coordinator jobs. But if you don't have a Oozie server handy, cron jobs could also meet your needs.
Regards, -BaiGang
On Fri, Mar 25, 2011 at 1:09 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
> I am not sure if this is the right listserv, forgive me if it is not. My > goal is this: monitor HDFS until a file is create, and then kick off a job. > Ideally I'd want to do this continuously, but the file would be create > hourly (with some sort of variance). I guess I could make a script that > would ping the server every 5 minutes or something, but I was wondering if > there might be a more elegant way? > > Thanks > Jon >
+
Bai, Gang 2011-03-25, 14:25
-
Re: A way to monitor HDFS for a file to come live, and then kick off a job?
Mapred Learn 2011-03-25, 16:28
Does Oozie co-ordinator work ? Last time I tried it, it had lot of problems:
i) job from start to end_timestamp were all being submitted at once not at actual wall clock time.
ii) The links to all the jobs in a particular co-ordinator work-flow were not working i.e. you were not able to see the progress of the jobs running.
-JJ
On Fri, Mar 25, 2011 at 7:25 AM, Bai, Gang <[EMAIL PROTECTED]> wrote:
> Hi Jon, > > Oozie could handle this nicely. You may just specify a Oozie coordinator > jobs. But if you don't have a Oozie server handy, cron jobs could also meet > your needs. > > Regards, > -BaiGang > > > On Fri, Mar 25, 2011 at 1:09 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > >> I am not sure if this is the right listserv, forgive me if it is not. My >> goal is this: monitor HDFS until a file is create, and then kick off a job. >> Ideally I'd want to do this continuously, but the file would be create >> hourly (with some sort of variance). I guess I could make a script that >> would ping the server every 5 minutes or something, but I was wondering if >> there might be a more elegant way? >> >> Thanks >> Jon >> > >
+
Mapred Learn 2011-03-25, 16:28
-
Re: A way to monitor HDFS for a file to come live, and then kick off a job?
Allen Wittenauer 2011-03-25, 16:46
On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:
> I am not sure if this is the right listserv, forgive me if it is not.
A better choice would likely be hdfs-user@, since this is really about watching files in HDFS. > My > goal is this: monitor HDFS until a file is create, and then kick off a job. > Ideally I'd want to do this continuously, but the file would be create > hourly (with some sort of variance). I guess I could make a script that > would ping the server every 5 minutes or something, but I was wondering if > there might be a more elegant way?
Two ways off the top of my head:
1) Read/watch the edits stream
2) Read/watch the HDFS audit log
Given the latter is text built by log4j, that should be relatively simple to implement.
There was a JIRA asking for this functionally to be built in recently, btw.
+
Allen Wittenauer 2011-03-25, 16:46
-
Re: A way to monitor HDFS for a file to come live, and then kick off a job?
Lance Norskog 2011-05-15, 22:09
Hamake does exactly this: http://code.google.com/p/hamake/On Fri, Mar 25, 2011 at 9:46 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote: > >> I am not sure if this is the right listserv, forgive me if it is not. > > A better choice would likely be hdfs-user@, since this is really about watching files in HDFS. > > >> My >> goal is this: monitor HDFS until a file is create, and then kick off a job. >> Ideally I'd want to do this continuously, but the file would be create >> hourly (with some sort of variance). I guess I could make a script that >> would ping the server every 5 minutes or something, but I was wondering if >> there might be a more elegant way? > > Two ways off the top of my head: > > 1) Read/watch the edits stream > > 2) Read/watch the HDFS audit log > > Given the latter is text built by log4j, that should be relatively simple to implement. > > There was a JIRA asking for this functionally to be built in recently, btw. -- Lance Norskog [EMAIL PROTECTED]
+
Lance Norskog 2011-05-15, 22:09
|
|