1、获取数据dom
> library(quantmod) > codedata<-list() > codepool<-c("MMM","AXP","AAPL","BA","CAT","CVX","CSCO","KO","DD","XOM","GE","GS","HD","IBM","INTC","JNJ","JPM","MCD","MRK","MSFT","NKE","PFE","PG","TRV","UTX","UNH","VZ","WMT","V","DIS") > number<-length(codepool) > for(i in 1:number){ + n<-codepool[i] + TESTdata<-get(getSymbols(n,src="yahoo",from=Sys.Date()-50,to=Sys.Date())) + codedata[[i]]<-as.data.frame(TESTdata[,4]) + next} > finaldata<-do.call(cbind,codedata) > names(finaldata)<-codepool > head(finaldata,3) MMM AXP AAPL BA CAT CVX CSCO KO DD XOM GE GS HD IBM INTC JNJ JPM MCD MRK MSFT NKE PFE PG TRV 2017-01-23 178.51 75.97 120.08 157.84 94.46 115.39 30.27 41.43 72.78 84.97 29.75 232.67 138.07 171.03 36.77 113.91 83.71 121.38 61.81 62.96 53.24 31.46 86.96 118.04 2017-01-24 175.97 77.43 119.97 160.55 96.24 116.37 30.60 41.90 76.05 85.09 30.00 233.68 138.06 175.90 37.62 111.76 84.72 121.05 61.21 63.52 53.45 31.15 87.86 116.84 2017-01-25 176.73 76.89 121.88 167.36 98.15 117.24 30.70 42.12 76.67 85.34 30.37 237.25 137.48 178.29 37.80 112.80 86.03 121.79 61.08 63.68 53.86 31.29 87.16 117.67 UTX UNH VZ WMT V DIS 2017-01-23 110.34 159.07 52.41 66.65 82.15 107.12 2017-01-24 111.61 160.43 50.12 67.40 83.23 107.90 2017-01-25 110.96 161.24 49.77 66.89 83.90 108.04 > tail(finaldata,3) MMM AXP AAPL BA CAT CVX CSCO KO DD XOM GE GS HD IBM INTC JNJ JPM MCD MRK MSFT NKE PFE PG TRV 2017-03-08 189.51 79.04 139.00 181.74 93.23 109.61 34.02 41.99 79.77 81.03 29.80 250.24 146.92 179.45 35.62 124.10 91.21 128.09 65.80 64.99 56.51 33.91 90.14 121.24 2017-03-09 189.90 79.30 138.68 180.57 91.39 110.04 34.07 42.03 80.48 81.67 29.66 250.18 146.62 177.18 35.82 125.95 91.57 128.14 65.89 64.73 56.36 34.05 90.34 121.90 2017-03-10 191.21 79.38 139.14 178.70 92.31 110.61 34.26 42.29 80.86 81.61 30.28 248.38 146.85 177.83 35.91 126.21 91.28 127.98 65.60 64.93 56.43 34.11 91.07 122.83 UTX UNH VZ WMT V DIS 2017-03-08 111.75 167.91 49.16 69.80 88.96 110.84 2017-03-09 111.93 168.01 49.28 69.86 89.11 111.03 2017-03-10 112.14 169.98 49.35 70.10 89.73 110.92
道琼斯指数共有30只成分股,取各个成分股最近50天的收盘价。code
2、主成分分析component
> p<-princomp(finaldata[,1:30]) > summary(p) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Standard deviation 16.1041328 4.22688825 3.50222384 2.20689405 2.01370950 1.459120153 1.298793653 0.991546988 0.858445383 0.752024098 0.679270196 0.579182056 Proportion of Variance 0.8458697 0.05827339 0.04000521 0.01588517 0.01322581 0.006944019 0.005501855 0.003206678 0.002403555 0.001844558 0.001504922 0.001094105 Cumulative Proportion 0.8458697 0.90414310 0.94414831 0.96003348 0.97325929 0.980203314 0.985705168 0.988911847 0.991315402 0.993159960 0.994664882 0.995758987
主成分分析法生成了30个新变量,前面3个变量能够解释到原数据集94.41%(0.8458697+0.05827339+0.04000521)的方差,所以咱们能够用这3个新的变量代替原来30个初始变量。orm
> stockdata<-as.data.frame(as.matrix(finaldata[,1:30])%*%as.matrix(p$loadings[,1:3])) > head(stockdata) Comp.1 Comp.2 Comp.3 2017-01-23 -497.0058 -3.717498 174.8456 2017-01-24 -498.6973 -5.271716 178.5616 2017-01-25 -505.3037 -4.955875 181.6700 2017-01-26 -507.5417 -4.710687 182.2582 2017-01-27 -506.5475 -3.575603 180.2852 2017-01-30 -502.8693 -3.291562 176.4937
3、回归分析ip
一、获取道琼斯指数最近50天数据,合并数据框ci
> res<-get(getSymbols("DJIA",src="yahoo",from=Sys.Date()-50,to=Sys.Date())) > US30<-as.data.frame(res[,4]) > mydata<-cbind(stockdata,US30) > names(mydata)<-c("X1","X2","X3","US30") > head(mydata) X1 X2 X3 US30 2017-01-23 -497.0058 -3.717498 174.8456 19799.85 2017-01-24 -498.6973 -5.271716 178.5616 19912.71 2017-01-25 -505.3037 -4.955875 181.6700 20068.51 2017-01-26 -507.5417 -4.710687 182.2582 20100.91 2017-01-27 -506.5475 -3.575603 180.2852 20093.78 2017-01-30 -502.8693 -3.291562 176.4937 19971.13
二、构造多元线性回归模型get
> model<-lm(US30~X1+X2+X3,mydata) > summary(model) Call: lm(formula = US30 ~ X1 + X2 + X3, data = mydata) Residuals: Min 1Q Median 3Q Max -50.267 -7.866 -1.937 10.018 39.555 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5395.8202 218.4604 24.699 < 2e-16 *** X1 -25.7840 0.2282 -113.006 < 2e-16 *** X2 0.4522 0.8693 0.520 0.607 X3 9.1254 1.0492 8.698 1.06e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 21.43 on 30 degrees of freedom Multiple R-squared: 0.9977, Adjusted R-squared: 0.9974 F-statistic: 4282 on 3 and 30 DF, p-value: < 2.2e-16 > model Call: lm(formula = US30 ~ X1 + X2 + X3, data = mydata) Coefficients: (Intercept) X1 X2 X3 5395.8202 -25.7840 0.4522 9.1254
检验显示X2与US30不显著,剔除X2。所以线性回归方程为:io
> US30 = 5395.8202 + -25.7840*X1 + 9.1254*X3