Thanks Jacques. I'm very happy to get involved and share my experiences.
I'm looking for the best way to set up a cluster now. In terms of
evaluating Drill's performance, do you think it's especially important to
have a system that would be close in performance to a production cluster,
or would it be worthwhile exploring it on a small scale? Problem being a
student, my budget is limited, so I'm exploring things like Raspberry Pi
clusters, which I think don't have linear performance improvements as you
scale out. I'm also enquiring about EC2 or GCE student licensing.
On 29 August 2013 05:08, Jacques Nadeau <[EMAIL PROTECTED]> wrote:
> A Hadoop cluster would be a good start. We're in the process right now of
> putting together distributable files which will help get you to up to speed
> quickly. Contribution isn't just code, there are many types and I'm sure
> you can help in any number of ways. Just documenting your early
> experiences and advice would be a great way to start helping out.
> On Sun, Aug 25, 2013 at 1:25 PM, Tom Seddon <[EMAIL PROTECTED]>
> > Hi,
> > I'm looking to do a dissertation on Drill, as part of masters degree in
> > Data Science. I'm hoping to set up a cluster to run it and then analyse
> > its efficiency with different datasets, as well as make recommendations
> > its usage. I know Drill is in a fairly early stage of development but I
> > have around 18 months until the project is due, so I'm hoping the timing
> > will work as Drill is developed further.
> > I'd be grateful for any advice on how I could get started on this.
> Would a
> > Hadoop cluster be a good back-end to base my project on or would
> > more suited to nested data like MongoDB be more appropriate? Also, I
> > haven't found much documentation on configuring Drill in a distributed
> > environment, so any help on this would be appreciated.
> > I'd also be willing to contribute but not sure if I have enough Java
> > experience. My background is mainly in BI and database technologies.
> > Thanks,
> > Tom