Datasets for Running Statistical Analysis on [clos

2楼-- · 2019-01-29 16:00

The datasets package is included with base R. Run this command to see a full list:

library(help="datasets")

Beyond that, there are many packages that can pull data, and many others that contain important data. Of these, you may want to start by looking at the HistData package, which "provides a collection of small data sets that are interesting and important in the history of statistics and data visualization".

For financial data, the quantmod package provides a common interface for pulling time series data from google, yahoo, FRED, and others:

library(quantmod)
getSymbols("YHOO",src="google") # from google finance 
getSymbols("GOOG",src="yahoo") # from yahoo finance 
getSymbols("DEXUSJP",src="FRED") # FX rates from FRED

FRED (the Federal Reserve of St. Louis) is really a landmine of free economic data.

Many R packages come bundled with data that is specific to their goal. So if you're interested in genetics, multilevel models, etc., the relevant packages will frequently have the canonical example for that analysis. Also, the book packages typically ship with the data needed to reproduce all the examples.

Here are some examples of relevant packages:

alr3: includes data to accompany Applied Linear Regression (http://www.stat.umn.edu/alr)
arm: includes some of the data from Gelman's "Data Analysis Using Regression and Multilevel/Hierarchical Models" (the rest of the data and code is on the book's website)
BaM: includes data from "Bayesian Methods: A Social and Behavioral Sciences Approach"
BayesDA: includes data from Gelman's "Bayesian Data Analysis"
cat: includes data for analysis of categorical-variable datasets
cimis: from retrieving data from CIMIS, the California Irrigation Management Information System
cshapes: includes GIS data boundaries and data
ecdat: data sets for econometrics
ElemStatLearn: includes data from "The Elements of Statistical Learning, Data Mining, Inference, and Prediction"
emdbook: data from "Ecological Models and Data"
Fahrmeir: data from the book "Multivariate Statistical Modelling Based on Generalized Linear Models"
fEcoFin: "Economic and Financial Data Sets" for Rmetrics
fds: functional data sets
fma: data sets from "Forecasting: methods and applications"
gamair: data for "Generalized Additive Models: An Introduction with R"
geomapdata: data for topographic and Geologic Mapping
nutshell: contains all the data from the "R in a Nutshell" book
nytR: provides access to congressional vote data through the NY Times API
openintro: data from the book
primer: includes data for "A Primer of Ecology with R"
qtlbook: includes data for the R/qtl book
RGraphics: includes data from the "R Graphics" book
Read.isi: access to old World Fertility Survey data

0人赞添加讨论(0) 举报

干净又极端

3楼-- · 2019-01-29 16:00

Another good site is UN Data.

The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) launched a new internet based data service for the global user community. It brings UN statistical databases within easy reach of users through a single entry point (http://data.un.org/). Users can now search and download a variety of statistical resources of the UN system.

0人赞添加讨论(0) 举报

不美不萌又怎样

4楼-- · 2019-01-29 16:02

A broad selection on the Web. For instance, here's a massive directory of sports databases (all providing the data free of charge, at least that's my experience). In that directory is databaseBaseball.com, which contains among other things, complete datasets for every player who has ever played professional baseball since about 1915.

StatLib is an other excellent resource--beautifully convenient. This single web page lists 4-5 line summaries of over a hundred databases, all of which are available in flat-file form just by clicking the 'Table' link at the beginning of each data set summary.

The base distribution of R comes pre-packaged with a large and varied collection of datasts (122 in R 2.10). To get a list of them (as well as a one-line description):

data(package="datasets")

Likewise, most packages come with several data sets (sometimes a lot more). You can see those the same way:

data(package="latticeExtra")
data(package="vcd")

These data sets are the ones mentioned in the package manuals and vignettes for a given package, and used to illustrate the package features.

A few R packages with a lot of datasets (which again are easy to scan so you can choose what's interesting to you): AER, DAAG, and vcd.

Another thing i find so impressive about R is its I/O. Suppose you want to get some very specific financial data via the yahoo finance API. Let's say closing open and closing price of S&P 500 for every month from 2001 to 2009, just do this:

tick_data = read.csv(paste("http://ichart.finance.yahoo.com/table.csv?",
    "s=%5EGSPC&a=03&b=1&c=2001&d=03&e=1&f=2009&g=m&ignore=.csv"))

In this one line of code, R has fetched the tick data, shaped it to a dataframe and bound it to 'tick_data' all . (Here's a handy cheat sheet w/ the Yahoo Finance API symbols used to build the URLs as above)

0人赞添加讨论(0) 举报

Summer. ? 凉城

5楼-- · 2019-01-29 16:02

http://www.data.gov.uk/data

Recently setup by Tim Berners-Lee

Obviously UK based data, but that shouldn't matter. Covers everything from abandoned cars to school absenteeism to agricultural price indexes

0人赞添加讨论(0) 举报

再贱就再见

6楼-- · 2019-01-29 16:04

See the data competition set up by Hadley Wickham for the Data Expo of the ASA Statistical Computing and Statistical Graphics section. The competition is over, the data is still there.

0人赞添加讨论(0) 举报

姐就是有狂的资本

7楼-- · 2019-01-29 16:04

Another collection of datasets.

0人赞添加讨论(0) 举报

Datasets for Running Statistical Analysis on [clos

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间