Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Looking for advise on Hbase setup


Copy link to this message
-
Re: Looking for advise on Hbase setup
There is https://issues.apache.org/jira/browse/HBASE-2433

<https://issues.apache.org/jira/browse/HBASE-2433>And there is some other
work that I did earlier on storing and navigating graphs in HBase. I'll
include those ideas in the RDF store..

On Sun, Apr 25, 2010 at 1:58 PM, Aaron McCurry <[EMAIL PROTECTED]> wrote:

> As far as the graph db goes...  I'm just formulated a plan now, but for
> high
> level features:
>
>
>   - Needs to be able to house billions of nodes with billions of edges
>   (possible millions of edges to and from single nodes) that needs to
> operate
>   in real time.
>   - Needs to be able to load and unload large amounts of nodes and edges on
>   regular basis.
>   - I already have a scalable search solution, so I don't really need to
>   have a node or edge search system.
>
> That's about all I have at this point.  Is there a current project in the
> works for a graph db on hbase?  I would love to help out.
>
> Aaron
>
>
> On Sun, Apr 25, 2010 at 3:48 PM, Amandeep Khurana <[EMAIL PROTECTED]>
> wrote:
>
> > If you want to serve some application off hbase, you might be better
> > off with a separate cluster so you don't mix workloads with the MR
> > jobs...
> >
> > What kind of graph db are you looking to build? There is work being
> > done on that front and we would like to know about your use case...
> >
> > On 4/25/10, Aaron McCurry <[EMAIL PROTECTED]> wrote:
> > > I have been fan of hbase for awhile, but until now I haven't had any
> > extra
> > > hardware to setup and run an instance.  Now I'm trying to decide what
> > would
> > > be the most ideal setup.
> > >
> > > I have a 64 node hadoop/hive setup, each node has dual quad core
> > processors
> > > with 32 Gig of ram and 4 T of storage.  Now my options are, to run a 64
> > way
> > > hbase setup on those nodes, or possible run hbase on a separate set of
> > > machines up to 16 nodes of the same type, but they would only be used
> for
> > > hbase.  I'm leaning toward running hbase on the 64 way cluster with
> > hadoop,
> > > because I'm going to be using hbase in some map reduce jobs and for the
> > > size.
> > >
> > > What I'm planning on doing with the cluster:
> > >
> > >    - Migrate some large berkeley dbs to hbase (15 - 20 billion records)
> > >    - Mix some live data from hbase with some batch processing in hive
> > (small
> > >    amount of data)
> > >    - Build a large graph db on top of hbase (size unknown, billions at
> > >    least)
> > >    - Probably a lot more things as time goes along
> > >
> > > Thoughts and opinions welcome.  Thanks!
> > >
> > > Aaron
> > >
> >
> >
> > --
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB