Robert Nicholson 2012-08-19, 16:46
Michael Segel 2012-08-19, 21:49
Lance Norskog 2012-08-20, 00:43
Russell Jurney 2012-08-20, 01:31
Lance Norskog 2012-08-20, 01:57
Ted Dunning 2012-08-20, 02:13
Russell Jurney 2012-08-19, 17:27
To add to Russell's answer:
If real-time processing of events is required, you might want to use a
stream-processing system like Apache S4 or Twitter's Storm.
On Sun, Aug 19, 2012 at 10:27 AM, Russell Jurney
> The model with Hadoop would be to aggregate and write your events to
> The Hadoop Distributed FileSystem, and then process them with
> scheduled batch jobs via Hadoop MapReduce. If your requirements can
> include some latency - then Hadoop can work for you. Depending on your
> processing, you can schedule jobs down to say... every hour, half hour
> or fifteen minutes? I'm not aware or anyone scheduling jobs more
> frequently than that, but they may be. Chime in if you are.
> For getting events to HDFS, look at Flume, Kafka and Scribe. For
> processing events, look at Pig, HIVE and Cascading. For scheduling
> jobs look at Oozie and Azkaban.
> Russell Jurney http://datasyndrome.com
> On Aug 19, 2012, at 9:47 AM, Robert Nicholson
> <[EMAIL PROTECTED]> wrote:
> > We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues. Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> > The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?