Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> I need some raw big data


+
Yin Steve 2012-12-07, 15:01
+
Harsh J 2012-12-07, 15:48
+
Phillip Rhodes 2012-12-07, 15:57
+
Chris Nauroth 2012-12-07, 21:55
Copy link to this message
-
Re: I need some raw big data
Hello Yin,

       You may find this interesting :
https://github.com/unitedstates

Regards,
    Mohammad Tariq

On Sat, Dec 8, 2012 at 3:25 AM, Chris Nauroth <[EMAIL PROTECTED]>wrote:

> Another suggestion is Google Books Ngrams:
>
> http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
>
>
> On Fri, Dec 7, 2012 at 7:57 AM, Phillip Rhodes <[EMAIL PROTECTED]>wrote:
>
>> On Fri, Dec 7, 2012 at 10:48 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> >
>> > On Fri, Dec 7, 2012 at 8:31 PM, Yin Steve <[EMAIL PROTECTED]> wrote:
>> >>  Hello, I'm Steve who need some raw big data for studying mapreduce
>> >> programming. Where can i find them? especially those about weblog,
>> traffic
>> >> info etc. My English is not so well, if you can give me a URL which
>> directly
>> >> help me download the big file, That'll be great.
>> >> Waiting for your reply......
>>
>> Try some of the links off of this Quora thread:
>>
>>
>> http://www.quora.com/Data/Where-can-I-find-large-datasets-for-modeling-confidence-during-the-financial-crisis-which-is-open-to-the-public
>>
>> You might also try googling "Enron corpus".   Or check out
>> CommonCrawl.org.
>>
>>
>> Phil
>>
>
>
+
Sujit Dhamale 2012-12-08, 05:08
+
Bruce Durling 2012-12-07, 15:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB