There are additional off-shoots of Hadoop that can specifically address
real-time needs such as Spark, S4 and Hstreaming.
Most real-time-ish applications come, however, with a 100% uptime guarantee.
Most simply put, a system that is down and is going to take 10's to 100's
of minutes to come back is going to miss a lot of real-time windows.
As such, you may need to investigate derivatives of Hadoop that explicitly
support high availability.
On Sat, Sep 3, 2011 at 11:38 PM, Jacques <[EMAIL PROTECTED]> wrote:
> It is hard to reply to an article that you don't actually reference but
> do my best. Also, you don't define real-time so I'll consider it as being
> something that would come back within 1-2 seconds (e.g. an end user on a
> site is waiting for the info).
> >>Can you please tell me why Hadoop is said not to be used for Real time
> processing of data?
> There are two different parts to the core Hadoop project. Both of these
> are focused more on batch processing by themselves as opposed to real time
> 1. HDFS, a distributed file system that is good at safely managing a large
> quantity of very large files. Generally speaking, Hadoop is a write once
> file system. You can't modify the middle of a file after it is written.
> You also can't append to the end of a file without a special version of
> Hadoop. Also, you can't tail a file directly as it is being written. As
> such, it would be hard to use it directly to create a real-time work flow.
> 2. MapReduce is a distributed computing framework. It is used to process
> those large files held on HDFS. Because of the design of MapReduce, jobs
> usually take at least 10 seconds and typically much longer. This would also
> mean you're looking at batch processing large quantities of data in some
> non-real-time period.
> HBase, is a separate, sub-project from the Hadoop project proper. It is
> built specifically to handle real time loads. You can insert a row and get
> it back immediately.
> >I was thinking we can replace the DB with Hadoop...I do not see any
> HBase can replace many of the functions of existing databases but should be
> used primarily when you need the massive scale it can provide. You have to
> give up things like transactions and SQL to HBase when compared to
> traditional RDBMS's (Mysql, PostreSQL, etc). The schema design is very
> different and generally your application must be built with this in mind.
> You should probably spend some time with the HBase book (
> http://hbase.apache.org/book.html) and looking at your current
> to determine what kinds of things you would need to do. Many people
> actually use HBase in parallel with a traditional RDBMS, leveraging the
> strengths of each.
> Good luck!