Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Introduction



Stefan,

> glad that I can help. May I suggest that I continue in the creation of use cases and the respective types of query profiles:
> * Wikipedia Edit History: After an initial glance the history is made up of 40 or so tables. I would design some user stories using join like queries across multiple tables - or however they are called in Drill.
> * I did not have an opportunity to check the Enron Stuff, but here I would design user stories as if building an email client, this would lead to heavy usage of a full text searching.
>
> There are some additional data-sets I would like to suggest: http://aws.amazon.com/datasets
>
> * Freebase.com: Simulate a visualization to jump from topic to topic as usert stories. This would lead to queries on a random and very small rowset.
> * Wikipedia Page Traffic Statistics: Simulate a log analysis. Heavy aggregation and date function on a large number of rows.
> * Global Weather Measurements: Design user stories based on geographic and chronoligic aggregation of climate data to visualize trends.

That sounds great! I reckon, as soon as we hear back from Ted re the Wiki we work there. For the time being, let's continue the discussion here.

Cheers,
Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 11 Jan 2013, at 00:18, "Siprell, Stefan" <[EMAIL PROTECTED]> wrote:

> Hi,
> glad that I can help. May I suggest that I continue in the creation of use cases and the respective types of query profiles:
> * Wikipedia Edit History: After an initial glance the history is made up of 40 or so tables. I would design some user stories using join like queries across multiple tables - or however they are called in Drill.
> * I did not have an opportunity to check the Enron Stuff, but here I would design user stories as if building an email client, this would lead to heavy usage of a full text searching.
>
> There are some additional data-sets I would like to suggest: http://aws.amazon.com/datasets
>
> * Freebase.com: Simulate a visualization to jump from topic to topic as usert stories. This would lead to queries on a random and very small rowset.
> * Wikipedia Page Traffic Statistics: Simulate a log analysis. Heavy aggregation and date function on a large number of rows.
> * Global Weather Measurements: Design user stories based on geographic and chronoligic aggregation of climate data to visualize trends.
>
>
> Regards
> Stefan
>
> ________________________________________
> Von: Michael Hausenblas [[EMAIL PROTECTED]]
> Gesendet: Donnerstag, 10. Januar 2013 19:54
> An: [EMAIL PROTECTED]
> Betreff: Re: Introduction
>
>> Michael Hausenblas is beginning to collect data sets and query examples for
>> different plausible use cases ranging from small to large.  He should show
>> up on the mailing list shortly and you could coordinate with him.
>
>
> Welcome, Stefan - great to have you on board!
>
> So the idea would be to compile a list of datasets along with typical (interesting) queries formulated in natural language. One thing we need to get this off the ground is the Wiki but I gather Ted is on that ..
>
> Datasets that might be of interest include, but are not restricted to:
>
> * Wikipedia edit history from [1]
> * Census data (US, Eurostat, etc.)
> * AOL search logs
> * Enron emails [2]
>
> Feel free to come up with additional ones as well.
>
> I suppose we can continue the discussion (who looks into what) here on the list and once the Wiki is available we can co-ordinate also via it.
>
> Cheers,
>                Michael
>
> [1] http://en.wikipedia.org/wiki/Wikipedia:Database_download
> [2] http://www.cs.cmu.edu/~enron/
>
> --
> Michael Hausenblas
> Ireland, Europe
> http://mhausenblas.info/
>
> On 10 Jan 2013, at 10:19, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>> Stefan,
>>
>> One of the key things to do right now is to work on use cases.
>>
>> Michael Hausenblas is beginning to collect data sets and query examples for