Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> Drill Masters Project


Copy link to this message
-
Re: Drill Masters Project
Its an interesting question.  I think people will value your research more
the more similar you are to traditional production clusters.  That said, my
advice would be to just get started with what you have and then you can
move to a different kind of hardware as you make progress.

J
On Thu, Aug 29, 2013 at 2:25 AM, Tom Seddon <[EMAIL PROTECTED]> wrote:

> Thanks Jacques.  I'm very happy to get involved and share my experiences.
>
> I'm looking for the best way to set up a cluster now.  In terms of
> evaluating Drill's performance, do you think it's especially important to
> have a system that would be close in performance to a production cluster,
> or would it be worthwhile exploring it on a small scale?  Problem being a
> student, my budget is limited, so I'm exploring things like Raspberry Pi
> clusters, which I think don't have linear performance improvements as you
> scale out.  I'm also enquiring about EC2 or GCE student licensing.
>
>
> On 29 August 2013 05:08, Jacques Nadeau <[EMAIL PROTECTED]> wrote:
>
> > A Hadoop cluster would be a good start.  We're in the process right now
> of
> > putting together distributable files which will help get you to up to
> speed
> > quickly.  Contribution isn't just code, there are many types and I'm sure
> > you can help in any number of ways.  Just documenting your early
> > experiences and advice would be a great way to start helping out.
> >
> > Jacques
> >
> >
> > On Sun, Aug 25, 2013 at 1:25 PM, Tom Seddon <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm looking to do a dissertation on Drill, as part of masters degree in
> > > Data Science.  I'm hoping to set up a cluster to run it and then
> analyse
> > > its efficiency with different datasets, as well as make recommendations
> > for
> > > its usage. I know Drill is in a fairly early stage of development but I
> > > have around 18 months until the project is due, so I'm hoping the
> timing
> > > will work as Drill is developed further.
> > >
> > > I'd be grateful for any advice on how I could get started on this.
> >  Would a
> > > Hadoop cluster be a good back-end to base my project on or would
> > something
> > > more suited to nested data like MongoDB be more appropriate?  Also, I
> > > haven't found much documentation on configuring Drill in a distributed
> > > environment, so any help on this would be appreciated.
> > >
> > > I'd also be willing to contribute but not sure if I have enough Java
> > > experience.  My background is mainly in BI and database technologies.
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> >
>