Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: HADOOP in Production


Copy link to this message
-
Re: HADOOP in Production
Funny that the OP asks about 'real time'...

This comes up quiet often and its always misunderstood.

First, when we say 'real time' many take it to mean subjective real time.  Real 'real time' would require some sort of RTOS underneath.

Second Hadoop is a parallelized framework. You have several components that make up Hadoop.  A distributed scheduler, a distributed disk and tools to manipulate the data.

You can use Hadoop in subjective real time scenarios.

One common pattern is to use M/R to process the data, and HBase to deliver ad-hoc access to records returning a result in sub second response time.

I think that there's an upcoming talk at Strata in NY on using Hadoop, (HBase and SOLR) to provide real time access.

Out side of that, yeah Tom White's book is a great start, however, some of the feedback I've heard it that its a dry read.
But then again, most technical books are. :-)
On Oct 2, 2012, at 6:47 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]> wrote:

> Hi,
>
> There are too many issues to discuss I guess. I would recommend
> reading Hadoop The Definitive Guide by Tom White. There are some
> chapters for the answers.
> Also what did you mean my 'real time"? Hadoop is not designed for
> giving real time results of queries. It is rather for offline data
> analysis, because each query can take minutes or hours to finish.
> AFAIK, HBase provides some real time functionality though.
> For Hadoop automation, you can try Oozie. We are using opswise in our company
>
> Best Regards
>
> On Mon, Oct 1, 2012 at 5:36 PM, yogesh dhari <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> I have understood the Hadoop and Hadoop Ecosystem(Pig as ETL, Hive as
>> DataWare house, Sqoop as importing tool). I worked and learned on single
>> node cluster with demo data.
>>
>> As Hadoop suits best on Unix platform. Please help me to understand the
>> requirement form start to finish to use Hadoop in production.
>>
>> What would be the things to use Hadoop on real time project.
>>
>> like Hadoop automation on Unix, alert of failure process.
>>
>> Please put some light on using Hadoop on real time and what objectives are
>> recommended.
>>
>>
>> Thanks & Regards
>> Yogesh Kumar
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB