-Re: Brisk vs Cloudera Distribution
Edward Capriolo 2012-02-09, 04:57
Hadoop can work on a number of filessytems hdfs , s3. Local files. Brisk
file system is known as cfs. Cfs stores all block and meta data in
cassandra. Thus it does not use a name node. Brisk fires up a jobtracker
automatically as well. Brisk also has a hivemeta store backed by cassandra
so takes away that spof.
Brisk snappy compresses all data so you may not need to use compression or
sequence files. Performance wise I have gotten comparable numbers with tera
sort and tera gen. But the system work vastly differently and likely it
The hive integration is solid. Not sure what the biggest cluster is or
making other vague performance claims. Brisk is not active anymore the
commercial product is dse. There is a github fork of brisk however.
On Wednesday, February 8, 2012, rk vishu <[EMAIL PROTECTED]> wrote:
> Hello All,
> Could any one help me understand pros and cons of Brisk vs Cloudera Hadoop
> (DHFS + HBASE) in terms of functionality and performance?
> Wanted to keep aside the single point of failure (NN) issue while
> Are there any big clusters in petabytes using brisk in production? How is
> the performance comparision CFS vs HDFS? How is Hive integration?
> Thanks and Regrds