Large public datasets? [closed]

2019-01-29 15:24发布

I am looking for some large public datasets, in particular:

  1. Large sample web server logs that have been anonymized.

  2. Datasets used for database performance benchmarking.

Any other links to large public datasets would be appreciated. I already know about Amazon's public datasets at: http://aws.amazon.com/publicdatasets/

13条回答
We Are One
2楼-- · 2019-01-29 15:35

Google Fusion Tables has a few.

http://tables.googlelabs.com/

查看更多
叼着烟拽天下
3楼-- · 2019-01-29 15:36

http://Quandl.com has over 10 million data sets gleaned from all over the internet. The great thing about this resource is that it gives a single way to access all of the data. The site has a free Excel plug in or there are libraries in R, Python, Ruby, etc.

查看更多
放我归山
4楼-- · 2019-01-29 15:36

I am surprised no one mentioned Google N-Grams. More on N-Grams at http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

查看更多
三岁会撩人
5楼-- · 2019-01-29 15:37

Well, this one is new and there is a challenge behind it:

Million song dataset challenge

查看更多
对你真心纯属浪费
6楼-- · 2019-01-29 15:40

Based on Quora answers and my personal collections in my studies, an awesome-public-datasets repository was created and updated lively on GitHub:

Below is a snapshot version of this list. For a newest list, please visit Github:

This list of public data sources are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. This list comes from https://github.com/caesar0301/awesome-public-datasets.

Climate

Economics

Finance

Biology

Physics

Healthcare

GeoSpace

Transportation

Government

Data Challenges

Machine Learning

Natural Language

Image Processing

Time Series

Social Sciences

Complex Networks

Computer Networks

Data SEs

Public Doamins

Complementary Collections

查看更多
我只想做你的唯一
7楼-- · 2019-01-29 15:44

Datasets available here as well.

查看更多
登录 后发表回答