Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill >> mail # dev >> Introduction


+
Siprell, Stefan 2013-01-10, 13:45
+
Ted Dunning 2013-01-10, 18:19
+
Jason 2013-01-10, 18:36
+
Ted Dunning 2013-01-10, 19:07
+
Jason 2013-01-11, 17:58
+
Ted Dunning 2013-01-11, 19:29
+
Jason 2013-01-14, 19:24
+
Ted Dunning 2013-01-14, 21:53


> Michael Hausenblas is beginning to collect data sets and query examples for
> different plausible use cases ranging from small to large.  He should show
> up on the mailing list shortly and you could coordinate with him.
Welcome, Stefan - great to have you on board!

So the idea would be to compile a list of datasets along with typical (interesting) queries formulated in natural language. One thing we need to get this off the ground is the Wiki but I gather Ted is on that ..

Datasets that might be of interest include, but are not restricted to:

 * Wikipedia edit history from [1]
 * Census data (US, Eurostat, etc.)
 * AOL search logs
 * Enron emails [2]

Feel free to come up with additional ones as well.

I suppose we can continue the discussion (who looks into what) here on the list and once the Wiki is available we can co-ordinate also via it.

Cheers,
Michael

[1] http://en.wikipedia.org/wiki/Wikipedia:Database_download
[2] http://www.cs.cmu.edu/~enron/

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 10 Jan 2013, at 10:19, Ted Dunning <[EMAIL PROTECTED]> wrote:

> Stefan,
>
> One of the key things to do right now is to work on use cases.
>
> Michael Hausenblas is beginning to collect data sets and query examples for
> different plausible use cases ranging from small to large.  He should show
> up on the mailing list shortly and you could coordinate with him.
>
> On Thu, Jan 10, 2013 at 5:45 AM, Siprell, Stefan
> <[EMAIL PROTECTED]>wrote:
>
>> Hi all,
>> I am working for a IT consulting agency in Germany. One of the goals of
>> our team for 2013 is active (as in giving) participation in the open source
>> community and offering our customers cutting-edge analytical tools for
>> large to huge data bases. You guys hit the spot!
>>
>> I would like to start offering my personal help (volunteer work for now,
>> later I could pitch in a day or two per week perhaps) in any role which
>> would help. I am a somewhat strong enterprise java developer, can deal
>> sufficiently well with HTML5 frontends, know most things about build
>> environments and testing and should be able to do some design or
>> documentation.
>>
>> Is there anything I can do?
>>
>> Stefan
>>
+
Ellen Friedman 2013-01-27, 06:46
+
Michael Hausenblas 2013-01-13, 19:06
+
Ted Dunning 2013-01-13, 22:20
+
Michael Hausenblas 2013-01-13, 22:53
+
Ted Dunning 2013-01-13, 23:31
+
Jacques Nadeau 2013-01-19, 01:05
+
Siprell, Stefan 2013-01-19, 18:30
+
Jacques Nadeau 2013-01-19, 18:51
+
Jacques Nadeau 2013-01-19, 22:20
+
Siprell, Stefan 2013-01-19, 22:39
+
Jacques Nadeau 2013-01-19, 22:52
+
Siprell, Stefan 2013-01-19, 22:54
+
Jacques Nadeau 2013-01-19, 23:01
+
Siprell, Stefan 2013-01-19, 23:11
+
Jacques Nadeau 2013-01-19, 23:56
+
Siprell, Stefan 2013-01-20, 09:51
+
Jacques Nadeau 2013-01-20, 18:30
+
Siprell, Stefan 2013-01-20, 19:39
+
Ted Dunning 2013-01-20, 20:09
+
Jacques Nadeau 2013-01-20, 22:18
+
Ted Dunning 2013-01-20, 09:49
+
Ellen Friedman 2013-01-30, 07:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB