-Re: Looking for advise on Hbase setup
Amandeep Khurana 2010-04-25, 22:00
There is https://issues.apache.org/jira/browse/HBASE-2433
<https://issues.apache.org/jira/browse/HBASE-2433>And there is some other
work that I did earlier on storing and navigating graphs in HBase. I'll
include those ideas in the RDF store..
On Sun, Apr 25, 2010 at 1:58 PM, Aaron McCurry <[EMAIL PROTECTED]> wrote:
> As far as the graph db goes... I'm just formulated a plan now, but for
> level features:
> - Needs to be able to house billions of nodes with billions of edges
> (possible millions of edges to and from single nodes) that needs to
> in real time.
> - Needs to be able to load and unload large amounts of nodes and edges on
> regular basis.
> - I already have a scalable search solution, so I don't really need to
> have a node or edge search system.
> That's about all I have at this point. Is there a current project in the
> works for a graph db on hbase? I would love to help out.
> On Sun, Apr 25, 2010 at 3:48 PM, Amandeep Khurana <[EMAIL PROTECTED]>
> > If you want to serve some application off hbase, you might be better
> > off with a separate cluster so you don't mix workloads with the MR
> > jobs...
> > What kind of graph db are you looking to build? There is work being
> > done on that front and we would like to know about your use case...
> > On 4/25/10, Aaron McCurry <[EMAIL PROTECTED]> wrote:
> > > I have been fan of hbase for awhile, but until now I haven't had any
> > extra
> > > hardware to setup and run an instance. Now I'm trying to decide what
> > would
> > > be the most ideal setup.
> > >
> > > I have a 64 node hadoop/hive setup, each node has dual quad core
> > processors
> > > with 32 Gig of ram and 4 T of storage. Now my options are, to run a 64
> > way
> > > hbase setup on those nodes, or possible run hbase on a separate set of
> > > machines up to 16 nodes of the same type, but they would only be used
> > > hbase. I'm leaning toward running hbase on the 64 way cluster with
> > hadoop,
> > > because I'm going to be using hbase in some map reduce jobs and for the
> > > size.
> > >
> > > What I'm planning on doing with the cluster:
> > >
> > > - Migrate some large berkeley dbs to hbase (15 - 20 billion records)
> > > - Mix some live data from hbase with some batch processing in hive
> > (small
> > > amount of data)
> > > - Build a large graph db on top of hbase (size unknown, billions at
> > > least)
> > > - Probably a lot more things as time goes along
> > >
> > > Thoughts and opinions welcome. Thanks!
> > >
> > > Aaron
> > >
> > --
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz