Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Looking for advise on Hbase setup


Copy link to this message
-
Re: Looking for advise on Hbase setup
Amandeep Khurana 2010-04-25, 22:00
There is https://issues.apache.org/jira/browse/HBASE-2433

<https://issues.apache.org/jira/browse/HBASE-2433>And there is some other
work that I did earlier on storing and navigating graphs in HBase. I'll
include those ideas in the RDF store..

On Sun, Apr 25, 2010 at 1:58 PM, Aaron McCurry <[EMAIL PROTECTED]> wrote:

> As far as the graph db goes...  I'm just formulated a plan now, but for
> high
> level features:
>
>
>   - Needs to be able to house billions of nodes with billions of edges
>   (possible millions of edges to and from single nodes) that needs to
> operate
>   in real time.
>   - Needs to be able to load and unload large amounts of nodes and edges on
>   regular basis.
>   - I already have a scalable search solution, so I don't really need to
>   have a node or edge search system.
>
> That's about all I have at this point.  Is there a current project in the
> works for a graph db on hbase?  I would love to help out.
>
> Aaron
>
>
> On Sun, Apr 25, 2010 at 3:48 PM, Amandeep Khurana <[EMAIL PROTECTED]>
> wrote:
>
> > If you want to serve some application off hbase, you might be better
> > off with a separate cluster so you don't mix workloads with the MR
> > jobs...
> >
> > What kind of graph db are you looking to build? There is work being
> > done on that front and we would like to know about your use case...
> >
> > On 4/25/10, Aaron McCurry <[EMAIL PROTECTED]> wrote:
> > > I have been fan of hbase for awhile, but until now I haven't had any
> > extra
> > > hardware to setup and run an instance.  Now I'm trying to decide what
> > would
> > > be the most ideal setup.
> > >
> > > I have a 64 node hadoop/hive setup, each node has dual quad core
> > processors
> > > with 32 Gig of ram and 4 T of storage.  Now my options are, to run a 64
> > way
> > > hbase setup on those nodes, or possible run hbase on a separate set of
> > > machines up to 16 nodes of the same type, but they would only be used
> for
> > > hbase.  I'm leaning toward running hbase on the 64 way cluster with
> > hadoop,
> > > because I'm going to be using hbase in some map reduce jobs and for the
> > > size.
> > >
> > > What I'm planning on doing with the cluster:
> > >
> > >    - Migrate some large berkeley dbs to hbase (15 - 20 billion records)
> > >    - Mix some live data from hbase with some batch processing in hive
> > (small
> > >    amount of data)
> > >    - Build a large graph db on top of hbase (size unknown, billions at
> > >    least)
> > >    - Probably a lot more things as time goes along
> > >
> > > Thoughts and opinions welcome.  Thanks!
> > >
> > > Aaron
> > >
> >
> >
> > --
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>