Namenode is limited on the number of blocks. Whether you changed the block
size or not would not have much impact on the problem. I think that the
limit is something like 150 million blocks. (Someone else can feel free to
correct this.) (It isn't exactly that simple because it also has to do
with time for cluster recovery, hardware, etc) HDFS Federation (I believe
in the 0.23 branch) would increase this number through added complexity.
If you're talking about a smaller scale (which most people are really
focused on to start), just go with HDFS and don't worry about the
If your product takes off, you'll have the engineering staff to start doing
a three way solution e.g:
1) Less than 15mb, store in hbase
2) more than 15mb new, store direct in hdfs with pointer in hbase
3) Daily, combine the days files into a single archive, update hbase
pointers, delete original individual files for that day
On Sun, Mar 4, 2012 at 9:12 PM, Rohit Kelkar <[EMAIL PROTECTED]> wrote:
> Jacques, I agree that storing files (lets say greater than 15mb) would
> make the namenode run out of space. But what if I make my
> blocksize=15mb ?
> Even I am having the same issue that Konrad is mentioning and I have
> used exactly the approach number 2 that he mentioned.
> - Rohit Kelkar
> On Mon, Mar 5, 2012 at 6:59 AM, Jacques <[EMAIL PROTECTED]> wrote:
> >>>2) files bigger than 15MB are stored in HDFS and HBase keeps only some
> > information where file is placed
> > You're likely to run out of space due to the name node's file count limit
> > if you went with this solution straight. If you follow this path, you
> > probably would need to either regularily combine files into a Hadoop
> > Archive har file or something similar, or go with another backing
> > filesystem (e.g. mapr) that can support hundreds of millions of files
> > natively...
> > On Sun, Mar 4, 2012 at 4:12 PM, Konrad Tendera <[EMAIL PROTECTED]>
> >> So, what should I use instead of HBase? I'm wondering about following
> >> solution:
> >> 1) let's say our limit is 15MB - files up to this limit worth to keep in
> >> hbase
> >> is it appropriate way to solve the problem? Or maybe I should use
> >> http://www.lilyproject.org/ ?