I'm looking to do a dissertation on Drill, as part of masters degree in
Data Science. I'm hoping to set up a cluster to run it and then analyse
its efficiency with different datasets, as well as make recommendations for
its usage. I know Drill is in a fairly early stage of development but I
have around 18 months until the project is due, so I'm hoping the timing
will work as Drill is developed further.
I'd be grateful for any advice on how I could get started on this. Would a
Hadoop cluster be a good back-end to base my project on or would something
more suited to nested data like MongoDB be more appropriate? Also, I
haven't found much documentation on configuring Drill in a distributed
environment, so any help on this would be appreciated.
I'd also be willing to contribute but not sure if I have enough Java
experience. My background is mainly in BI and database technologies.