Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBase: project ideas


Copy link to this message
-
Re: HBase: project ideas
Hello Stack,
Thanks for the reply. please see inline.

Cheers,
Himanshu

On Thu, Aug 19, 2010 at 11:22 AM, Stack <[EMAIL PROTECTED]> wrote:

> On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha
> <[EMAIL PROTECTED]> wrote:
> > Dear All:
> > I have been looking around HBase (running/debugging it, etc) for a couple
> of
> > weeks now, and it is fascinating. I am in search of a good project for my
> > grad studies, focussing around HBase, but am not able to finalize it. I
> am
> > looking for some project idea that I can use. It can be user or a dev
> > project, I am open to all :)
> >
> > One idea (user specific) is to migrate a XQuery like tool that uses
> > relational db schema (there are bunch of papers suggesting it) to HBase,
> but
> > I don't sure whether it is really a judicial use of HBase. Please
> suggest.
> >
> >
>
> Hello Himanshu.
>
> Its hard to make suggestion when I've no clue as to your interests.
>
Hadoop fascinates me. I wrote a tool for my lab which indexes a given
document collection (of plain text files) and then user can query it from
four predefined operations... I store those indexes on HDFS using
Mapfiles(to reduce the request-response latency).

Can you cite some of the papers you mention?
> So, I want to carry it forward for XML, and I came across two approaches:
> indexing the doc, OR storing them in a rdbms style while also considering
> schema info.
>
Paper ( for index based approach): An efficient inverted index technique for
XML documents using RDBMS, Chiyoung Seo, others..2003.

and for rdbms approach: *A Comprehensive XQuery* to *SQL* Translation using
Dynamic Interval Encoding. David DeHaan, David Toman, Mariano P. Consens,
others... in 2003, and its references.

I developed a prototype for the index based one in HBase, but it is limited
in usage (due to its inherent approach of indexing, you can't fire elegant
operations like summing, grouping etc). Its quite raw.

 + Have you looked at HIVE?  It might be more pertinent making this run
> better atop hbase rather than making a new XQuery-like tool for hbase.
>

Not yet. I read that it runs a MR job for every query, and it kind of slows
its response time, so I skipped it past. But yes, it does provides lot of
relational schema stuff I see.

> + Build an app that allows various kind of location queries using
> geohashing+hbase combo.  There's a few fellas floating on the list who
> might be able to help you out on this project.
>
> For extra points, whatever you do, build it using hbase-2000 coprocessors.
>   I am sorry I couldn't get this.
>
> Thanks for writing the list Himanshu.
> St.Ack
>