PCA

📥 Download Notebook 👁️ View on GitHub 🚀 Open in Colab 🔍 nbviewer

Description for PCA notebook.

Notebook Contents

This notebook covers:

Topic 1
Topic 2
Topic 3

Use the buttons above to download the notebook or open it in your preferred environment.

📓 Notebook Preview

M9a: PCA example using hubway data¶

In [3]:

library(mvtnorm) # col.norm
library(tilting) # col.norm
library(ggplot2)
library("np") #npreg: you may need to install it
hpdata <- read.csv("./data/dataset-hubway2.csv", header=TRUE, sep = ",", quote="\"",dec = ".")
hpdata$TOWN <- NULL
hpdata$year <- NULL #zero-variance
#write.table(hpdata,"./output/dataset-hubway3.csv", sep = ",",dec = ".")
colnames(hpdata)

Nonparametric Kernel Methods for Mixed Datatypes (version 0.60-10)
[vignette("np_faq",package="np") provides answers to frequently asked questions]
[vignette("np",package="np") an overview]
[vignette("entropy_np",package="np") an overview of entropy-based methods]
Warning message in file(file, ifelse(append, "a", "w")):
“cannot open file './output/dataset-hubway3.csv': No such file or directory”

Error in file(file, ifelse(append, "a", "w")): cannot open the connection
Traceback:
1. write.table(hpdata, "./output/dataset-hubway3.csv", sep = ",", 
 .     dec = ".")
2. file(file, ifelse(append, "a", "w"))

In [45]:

# PCA Scree plot without standardizing data
hpca_cov <- prcomp(hpdata, scale=FALSE)
# We can extract the information summarized above (and much more) 
# from the attributes of the object hpca_cov
standard_deviation_of_each_component <- hpca_cov$sdev
var_per_dim <- standard_deviation_of_each_component^2
var_tot <- sum(var_per_dim)
var_tot
var_per_dim/var_tot
var_prop <- var_per_dim / sum(var_per_dim)
var_prop
cum_var <- cumsum(var_prop)
cum_var
plot(cum_var,xlab="Principal component", 
     ylab="Proportion of variance explained", ylim=c(0,1), type='b')

8298462.56254009

0.604277079010252
0.289490773781669
0.0547284503622438
0.0208837985727226
0.00807095342683796
0.00752787785881026
0.00515652076565351
0.00415842007503802
0.0027707562726213
0.000885404693833313
0.000529822683554946
0.000340909314077188
0.000292504731544651
0.000254839153032317
0.000180542047466206
0.000126918406239684
0.000110191072847556
6.73393420643508e-05
5.22717833553464e-05
3.76655747196708e-05
2.18217307948317e-05
9.48624010997684e-06
8.36822055040508e-06
7.04940971690849e-06
4.48035405488206e-06
1.96209342640623e-06
1.79682472877614e-06
8.8844562469251e-07
4.61726299002591e-07
2.92303519979804e-07
2.04167002152802e-07
1.18758877473866e-07
1.81891767363846e-08
9.24575963168155e-09
3.36177378469594e-09
1.98160423349925e-32
3.89285291618039e-33
3.89285291618039e-33
3.89285291618039e-33
2.00135064939812e-33

0.604277079010252
0.289490773781669
0.0547284503622438
0.0208837985727226
0.00807095342683796
0.00752787785881026
0.00515652076565351
0.00415842007503802
0.0027707562726213
0.000885404693833313
0.000529822683554946
0.000340909314077188
0.000292504731544651
0.000254839153032317
0.000180542047466206
0.000126918406239684
0.000110191072847556
6.73393420643508e-05
5.22717833553464e-05
3.76655747196708e-05
2.18217307948317e-05
9.48624010997684e-06
8.36822055040508e-06
7.04940971690849e-06
4.48035405488206e-06
1.96209342640623e-06
1.79682472877614e-06
8.8844562469251e-07
4.61726299002591e-07
2.92303519979804e-07
2.04167002152802e-07
1.18758877473866e-07
1.81891767363846e-08
9.24575963168155e-09
3.36177378469594e-09
1.98160423349925e-32
3.89285291618039e-33
3.89285291618039e-33
3.89285291618039e-33
2.00135064939812e-33

0.604277079010252
0.893767852791922
0.948496303154165
0.969380101726888
0.977451055153726
0.984978933012536
0.99013545377819
0.994293873853228
0.997064630125849
0.997950034819682
0.998479857503237
0.998820766817315
0.999113271548859
0.999368110701892
0.999548652749358
0.999675571155597
0.999785762228445
0.999853101570509
0.999905373353865
0.999943038928584
0.999964860659379
0.999974346899489
0.99998271512004
0.999989764529757
0.999994244883811
0.999996206977238
0.999998003801967
0.999998892247591
0.99999935397389
0.99999964627741
0.999999850444412
0.99999996920329
0.999999987392467
0.999999996638226
1
1
1
1
1
1

No description has been provided for this image

In [5]:

apply(hpdata, 2, mean)
apply(hpdata, 2, var)

station_id: 108.403
month: 8.676
day: 17.814
hour: 16.461
num_of_pickups: 1.689
num_of_dropoffs: 1.582
day_of_week: 4.119
weekend: 0.305
num_bikes_available: 6.17400909090909
num_bikes_disabled: 0.550924242424242
num_docks_available: 11.0545666666667
num_docks_disabled: 0.0155
TAZ_id: 342.766
Tot_Pop: 1152.306
HH_Pop: 970.562
HH: 466.605
Income_low: 166.012
Income_mid_low: 119.076
Income_mid_high: 90.184
Income_high: 91.333
Worker0: 121.394
Worker1: 195.038
Worker2: 118.083
Worker3p: 32.09
HHSize1: 187.066
HHSize2: 156.064
HHSize3: 62.662
HHSize4: 36.875
HHSize5p: 23.938
Veh0: 164.424
Veh1: 206.3
Veh2: 51.81
Veh3p: 44.071
Tot_Vehs: 457.168
Age0to4_enrollment: 15.874
Age5to14_enrollment: 71.945
Age15to18_enrollment: 46.089
Age19plus_commuters: 399.697
Age19plus_dorms: 160.874
num_bikes: 6.72493333333333

station_id: 4190.07266366366
month: 0.219243243243243
day: 77.1125165165165
hour: 16.2527317317317
num_of_pickups: 7.65793693693694
num_of_dropoffs: 5.33260860860861
day_of_week: 3.97881781781782
weekend: 0.212187187187187
num_bikes_available: 27.0836950869235
num_bikes_disabled: 0.71278812765934
num_docks_available: 39.2460889673052
num_docks_disabled: 0.0323059448337226
TAZ_id: 57926.4817257257
Tot_Pop: 1099763.90627027
HH_Pop: 975955.245401401
HH: 193299.980955956
Income_low: 29846.2541101101
Income_mid_low: 15488.6668908909
Income_mid_high: 10066.3024464464
Income_high: 13211.2673783784
Worker0: 15498.6193833834
Worker1: 34179.2918478478
Worker2: 18497.9901011011
Worker3p: 2574.89279279279
HHSize1: 31522.5161601602
HHSize2: 23486.6265305305
HHSize3: 4979.6393953954
HHSize4: 2311.05042542543
HHSize5p: 1828.07222822823
Veh0: 28917.4977217217
Veh1: 44057.7977977978
Veh2: 7539.14504504504
Veh3p: 2831.51947847848
Tot_Vehs: 261352.878654655
Age0to4_enrollment: 1876.78490890891
Age5to14_enrollment: 33439.3353103103
Age15to18_enrollment: 56187.874953954
Age19plus_commuters: 4464921.89909009
Age19plus_dorms: 862505.447571572
num_bikes: 27.6643796736682

From now on, we use only the correlation-matrix-based PCA (standardized).

First, we describe with plots and comment on how the first two principal components relate to the original columns in both covariance-matrix-based PCA and correlation-matrix-based (standardized) PCA. In both cases, is it possible to give a name to each of the two principal components?

In [38]:

hpca_cor <- prcomp(hpdata, scale=TRUE)
standard_deviation_of_each_component <- hpca_var$sdev
var_per_dim <- standard_deviation_of_each_component^2
var_tot <- sum(var_per_dim)
var_prop <- var_per_dim / sum(var_per_dim)
cum_var <- cumsum(var_prop)
plot(cum_var,xlab="Principal component", 
     ylab="Proportion of variance explained", ylim=c(0,1), type='b')

In [46]:

eigenvectors <- hpca_cor$rotation
col.norm(eigenvectors)
eigenvectors

PC1: 1
PC2: 1
PC3: 1
PC4: 0.999999999999999
PC5: 1
PC6: 0.999999999999999
PC7: 1
PC8: 1
PC9: 1
PC10: 1
PC11: 1
PC12: 1
PC13: 1
PC14: 1
PC15: 1
PC16: 1
PC17: 1
PC18: 1
PC19: 1
PC20: 1
PC21: 1
PC22: 0.999999999999999
PC23: 1
PC24: 1
PC25: 1
PC26: 1
PC27: 1
PC28: 1
PC29: 1
PC30: 1
PC31: 1
PC32: 1
PC33: 1
PC34: 1
PC35: 1
PC36: 1
PC37: 1
PC38: 1
PC39: 1
PC40: 1

A matrix: 40 × 40 of type dbl
	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8	PC9	PC10	⋯	PC31	PC32	PC33	PC34	PC35	PC36	PC37	PC38	PC39	PC40
station_id	-0.0683051454	0.127406411	-2.438988e-01	0.076445993	-0.157224882	0.121778624	-0.226892635	0.069604450	0.190496570	-0.078251798	⋯	-0.014379847	-4.429207e-05	-0.0095448899	-0.0035091584	-1.191120e-03	-6.132147e-16	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
month	0.0029512863	0.111301970	-8.033464e-03	-0.218432057	-0.164679275	-0.057002090	0.199254049	-0.523747065	0.045327015	-0.315614826	⋯	-0.007409287	1.213712e-02	-0.0067413173	-0.0013294179	1.836260e-04	8.254209e-17	-3.573530e-16	4.510281e-17	-1.249001e-16	2.012279e-16
day	0.0028452380	-0.090474106	3.350605e-07	0.291088766	0.156710169	0.036663872	-0.275307574	0.471312239	-0.055768280	0.256352525	⋯	-0.004356224	6.807712e-03	0.0004614240	-0.0004311098	-3.214006e-07	9.346215e-17	-1.461597e-16	-1.534756e-16	-1.642630e-16	-1.563136e-16
hour	-0.0007248008	-0.017084756	-7.001248e-02	-0.003962978	-0.101396588	0.012757074	-0.103760273	-0.189212314	-0.064367334	0.098068276	⋯	-0.010753705	-4.618763e-03	-0.0045892651	0.0000896872	2.696024e-04	-1.898338e-17	7.806492e-17	-6.108621e-18	6.840302e-17	1.327063e-16
num_of_pickups	0.0531741222	-0.107295629	2.047552e-01	-0.047960419	0.376809034	0.004348261	0.212586237	-0.019380971	0.440608306	0.020719964	⋯	-0.010642303	2.615332e-03	-0.0013053961	0.0015891017	1.733498e-05	-1.072274e-16	3.606046e-16	-7.197788e-18	4.538010e-16	3.148424e-18
num_of_dropoffs	0.0376794869	-0.131814889	2.318261e-01	-0.062900441	0.394590245	0.003918883	0.195261822	0.002881967	0.395241405	0.070903567	⋯	-0.010414331	-9.283285e-03	0.0081606499	0.0015026496	-2.947763e-04	-8.325931e-17	-6.785218e-17	3.417347e-17	-2.891293e-16	-2.140206e-16
day_of_week	-0.0135188670	0.046658108	8.596311e-03	-0.543492173	-0.230107053	-0.095927238	0.040659077	0.314364357	0.088810320	0.159119343	⋯	-0.008653116	2.077725e-02	0.0024364923	0.0030659107	-1.967949e-04	1.202738e-16	-1.620728e-16	-1.714500e-16	-3.288462e-18	1.351226e-16
weekend	-0.0073963673	0.033542407	1.928546e-02	-0.541212285	-0.237536001	-0.116556237	0.054075063	0.310466759	0.098923334	0.144794144	⋯	0.006503581	-2.717625e-02	-0.0015456491	-0.0010429409	1.526270e-04	-2.147080e-16	-8.965646e-17	2.646940e-16	-1.509528e-16	-2.963336e-17
num_bikes_available	0.0009412844	-0.542574263	-1.302388e-01	-0.116988683	0.037457554	-0.046403258	-0.111216587	0.017236168	0.019823158	-0.195414936	⋯	0.004597087	-8.144133e-03	0.0010027426	-0.0009797073	2.289072e-04	6.284880e-02	-3.533734e-01	-5.951938e-01	7.207884e-02	-8.071053e-03
num_bikes_disabled	0.0044668818	-0.109313648	1.212438e-01	-0.067030610	-0.003656694	0.009647572	-0.064069672	-0.318120230	-0.006359476	0.562033752	⋯	0.006118815	-2.862799e-04	-0.0021089070	-0.0004179514	-3.929440e-04	1.019585e-02	-5.732712e-02	-9.655720e-02	1.169322e-02	-1.309352e-03
num_docks_available	0.0383360185	0.449894490	1.629557e-01	0.073118289	0.027006807	0.044779830	0.103549041	0.092645854	0.150953354	0.050457123	⋯	0.005843381	-1.392875e-02	0.0027705713	-0.0013704559	4.501877e-04	5.976655e-17	2.255141e-16	-5.898060e-17	-4.792174e-17	-3.488420e-17
num_docks_disabled	-0.0027034431	-0.029400792	2.272388e-02	-0.025748358	-0.022837384	-0.004689502	-0.126273369	-0.340165728	-0.038441413	0.553520303	⋯	0.019019427	-8.590904e-03	-0.0073505295	-0.0020105515	5.985262e-05	-3.424144e-19	2.233456e-16	-1.079865e-16	2.320735e-16	2.203641e-17
TAZ_id	-0.0503486651	0.130529693	-2.088880e-02	-0.075130769	0.207189843	0.204636624	-0.078859526	0.056266143	0.131213938	-0.171659581	⋯	0.013186063	-2.623521e-02	0.0068617772	-0.0015031765	7.354337e-04	-4.460302e-17	-3.122502e-17	7.806256e-17	4.367166e-16	-3.073171e-16
Tot_Pop	-0.2141202907	-0.012470979	6.148877e-05	0.041149060	-0.039877336	0.106101707	0.041056729	0.006072038	0.155787387	0.025832610	⋯	-0.007875694	5.854561e-02	-0.0260740531	0.0009665256	-1.702158e-04	1.304355e-16	1.654926e-15	-7.806256e-16	-3.512815e-16	-2.245383e-16
HH_Pop	-0.2425002072	-0.012541244	-3.715184e-02	0.030564654	-0.048919262	0.031163829	0.007639717	-0.012013419	0.083696668	0.015152559	⋯	0.037690362	-8.314332e-02	0.0156598084	0.0366012915	9.442190e-01	-6.095229e-15	-9.228729e-16	-4.146640e-15	-5.350782e-15	5.136753e-15
HH	-0.2426455232	-0.026145254	8.737217e-02	0.004750507	-0.008694686	0.018679308	0.040800322	0.009605842	-0.045698697	-0.005225929	⋯	0.010680066	8.188442e-03	-0.0099612868	0.0525106486	-6.258470e-02	3.394977e-01	5.088323e-01	-2.693024e-01	-1.010570e-01	-6.774514e-01
Income_low	-0.1918210523	-0.139338576	7.841062e-02	0.170118749	-0.258133944	0.126577807	0.216535780	0.064758727	0.074853435	0.026737414	⋯	0.331762795	2.579073e-01	-0.2527266688	0.0629900875	-7.385962e-02	3.992994e-01	-2.291196e-01	1.425459e-01	-2.775437e-01	1.502669e-01
Income_mid_low	-0.2343664628	0.006671187	7.928744e-03	-0.024228484	0.036386395	0.002639447	0.003789759	-0.017574375	0.036637973	0.017101926	⋯	-0.365265522	-5.510919e-02	0.2307700248	0.0431988549	-4.877681e-02	2.876476e-01	-1.650534e-01	1.026873e-01	-1.999371e-01	1.082494e-01
Income_mid_high	-0.2242844307	0.046194442	4.435769e-02	-0.100520794	0.143670934	-0.054073163	-0.082455083	-0.033661155	-0.095483444	-0.024127303	⋯	-0.169449264	-2.987772e-01	0.0294544131	0.0257085383	-3.808574e-02	2.318936e-01	-1.330615e-01	8.278370e-02	-1.611838e-01	8.726767e-02
Income_high	-0.1902878916	0.061877768	1.690484e-01	-0.123547307	0.189921621	-0.074459709	-0.101526585	-0.012180407	-0.243633900	-0.057634094	⋯	0.085606104	-3.585322e-02	0.0661764787	0.0369666358	-4.231993e-02	2.656599e-01	-1.524367e-01	9.483793e-02	-1.846540e-01	9.997482e-02
Worker0	-0.1872598851	-0.122519090	1.366456e-01	0.159415352	-0.238941405	0.129345445	0.193830996	0.077584351	0.006335760	0.013881110	⋯	-0.183887985	-3.013826e-01	0.1906619866	0.0456545605	-4.456736e-02	7.037208e-02	1.041497e-01	-1.352881e-02	3.556592e-01	1.619177e-01
Worker1	-0.2300863759	-0.034471854	1.430086e-01	-0.015470125	0.005116619	0.001808327	0.063698216	0.009369399	-0.093048758	-0.010126247	⋯	0.228161006	1.322137e-01	-0.3062176116	0.0657844084	-7.417836e-02	1.045046e-01	1.546653e-01	-2.009068e-02	5.281643e-01	2.404525e-01
Worker2	-0.2275837688	0.044078330	2.568536e-02	-0.094639352	0.153517804	-0.054714250	-0.089484049	-0.030107526	-0.101071041	-0.025940260	⋯	-0.196386578	8.453266e-02	0.2507030707	0.0364994320	-5.151716e-02	7.688050e-02	1.137820e-01	-1.478004e-02	3.885525e-01	1.768927e-01
Worker3p	-0.1946664854	0.081505913	-1.680975e-01	-0.039923593	0.080768277	-0.015428902	-0.114266606	-0.060555385	0.198416371	0.027086002	⋯	0.238786849	1.020827e-01	-0.1103744018	0.0054569900	-2.457538e-02	2.868359e-02	4.245129e-02	-5.514330e-03	1.449663e-01	6.599747e-02
HHSize1	-0.2102799023	-0.065239738	2.468177e-01	-0.002224178	-0.005621804	0.012875115	0.114199803	0.044002626	-0.195398205	-0.013980850	⋯	-0.193286555	3.984020e-01	-0.0561499102	0.0824860743	5.270543e-02	-3.988430e-01	1.858198e-01	-1.919218e-01	-3.022625e-01	2.122843e-01
HHSize2	-0.2312293560	-0.001035003	1.434222e-01	-0.042732866	0.077159649	-0.007700192	0.013565428	0.011393793	-0.133914261	-0.036271699	⋯	0.329168496	-5.429415e-01	-0.1046877047	0.0393052969	-1.040702e-01	-3.442721e-01	1.603954e-01	-1.656625e-01	-2.609061e-01	1.832389e-01
HHSize3	-0.2331383298	0.008810449	-9.670797e-02	0.019869214	-0.033486309	0.031956291	-0.027213317	-0.023253530	0.129204595	0.009514150	⋯	0.095523205	3.973916e-01	0.5280509865	0.0209435216	-1.005249e-01	-1.585223e-01	7.385510e-02	-7.628038e-02	-1.201359e-01	8.437357e-02
HHSize4	-0.2137943889	0.002357582	-1.890718e-01	0.060144162	-0.088308379	0.044002608	-0.034022541	-0.034699669	0.235641066	0.041102135	⋯	-0.556257267	-3.247389e-02	-0.5328506255	-0.0003744127	-1.203231e-01	-1.079932e-01	5.031370e-02	-5.196592e-02	-8.184244e-02	5.747939e-02
HHSize5p	-0.1679453463	-0.011422840	-2.683551e-01	0.110838828	-0.188073331	0.063997703	-0.020125629	-0.047391609	0.343286178	0.072412808	⋯	0.200370108	-2.434334e-01	0.2335711483	0.0224093324	-1.881934e-01	-9.604797e-02	4.474848e-02	-4.621794e-02	-7.278980e-02	5.112157e-02
Veh0	-0.1744066824	-0.164701125	1.957586e-01	0.123653769	-0.200008862	0.089082952	0.301789386	0.052849062	-0.026567201	0.043155735	⋯	-0.021355466	-1.052575e-01	0.1236997261	-0.2162304163	-5.403446e-02	-2.384657e-01	-2.915189e-01	1.661507e-01	1.159703e-01	-3.103800e-01
Veh1	-0.2325214087	0.034515488	7.737388e-02	-0.037262788	0.065064595	-0.004886836	-0.058825383	-0.001278598	-0.082683109	-0.028071222	⋯	-0.030055419	9.517606e-02	-0.0757414590	0.1339514332	-7.050099e-02	-2.943452e-01	-3.598303e-01	2.050846e-01	1.431455e-01	-3.831112e-01
Veh2	-0.1973906255	0.076071868	-9.698819e-02	-0.113551313	0.159742472	-0.046203645	-0.179246797	-0.043836044	0.060836363	-0.031980706	⋯	0.157352174	1.790351e-03	-0.1181492458	0.2241318497	-2.986673e-02	-1.217606e-01	-1.488495e-01	8.483650e-02	5.921441e-02	-1.584800e-01
Veh3p	-0.2081851609	0.050039459	-5.063720e-02	-0.023641249	0.050024729	-0.035680563	-0.102803729	-0.012951612	-0.065797561	-0.018179310	⋯	0.018287552	2.567953e-02	0.0139423175	0.2307703741	-1.758791e-02	-7.462003e-02	-9.122127e-02	5.199140e-02	3.628910e-02	-9.712325e-02
Tot_Vehs	-0.2351988981	0.056155998	-1.725425e-02	-0.062752635	0.098209756	-0.030148774	-0.118863803	-0.017471010	-0.033639272	-0.028286908	⋯	0.056319910	6.419570e-02	-0.0647270104	-0.8959487061	-1.660314e-03	4.787837e-16	8.187895e-16	8.465451e-16	5.551115e-16	6.245005e-16
Age0to4_enrollment	-0.0987422162	0.020295978	-4.010485e-01	-0.017326598	0.220932290	-0.147363936	0.276457276	0.021472125	-0.033460187	0.147913068	⋯	0.039793721	5.224463e-02	-0.0097905565	0.0041197647	9.879509e-04	6.938894e-18	-1.526557e-16	-2.914335e-16	4.163336e-17	5.204170e-17
Age5to14_enrollment	-0.0758840052	0.017598606	-3.772358e-01	0.028032878	0.182401147	-0.147382698	0.351989452	0.059449383	-0.191524677	0.127402964	⋯	-0.007881523	-3.387971e-02	-0.0083911312	0.0004183490	-2.194008e-03	-1.717376e-16	-3.851086e-16	3.573530e-16	1.942890e-16	2.428613e-17
Age15to18_enrollment	0.0143820800	0.015616318	-2.568035e-01	0.040986715	0.086887060	-0.101877229	0.406781789	0.070101322	-0.169587104	0.048200046	⋯	-0.004928017	-1.869505e-02	-0.0026906160	-0.0025748010	-2.043159e-04	1.396452e-16	3.417405e-16	-2.359224e-16	-3.053113e-16	-3.989864e-17
Age19plus_commuters	0.0463437152	-0.009338845	-1.632189e-01	-0.173118834	0.108517603	0.597686717	0.067143735	0.002721519	-0.204673345	0.041806548	⋯	-0.012936153	-6.322537e-03	0.0058606444	0.0002978920	3.084511e-04	-9.540979e-17	1.387779e-16	1.214306e-16	7.632783e-17	1.457168e-16
Age19plus_dorms	0.0375961929	-0.014943928	-5.973843e-02	-0.206450926	0.099064598	0.649486963	0.056690550	-0.017095479	-0.075604901	0.032072130	⋯	0.012405656	1.042972e-02	-0.0085698404	-0.0006717254	-2.579505e-04	1.500536e-16	-2.949030e-17	1.734723e-17	-1.457168e-16	-1.144917e-16
num_bikes	0.0016483614	-0.554396303	-1.094031e-01	-0.126513876	0.036475386	-0.044365071	-0.120327401	-0.034009243	0.018593206	-0.103137440	⋯	0.005530755	-8.104158e-03	0.0006536484	-0.0010364587	1.634180e-04	-6.351898e-02	3.571416e-01	6.015405e-01	-7.284744e-02	8.157117e-03

In [51]:

# Let us plot the contribution of the original dimension to the 1st PCA
PC_contr <- eigenvectors[,c("PC1")]
# PC_contr
# We order by the magnitude of the contribution
# We use the - sign because we want a descending order
ord <- order( -abs(PC_contr) )
PC_contr <- PC_contr[ord]
#PC_contr
# We just select the 5 highest contributing dimensions (highest loadings)
PC_contr <- PC_contr[1:5]
PC_contr
barplot(PC_contr, main="Contribution to the 1st component", xlab="Original Dimensions")

HH: -0.242645523213089
HH_Pop: -0.242500207248033
Tot_Vehs: -0.235198898109267
Income_mid_low: -0.234366462782011
HHSize3: -0.233138329774904

Since the household size-related variables have negative loadings, we can describe the first principal component as the "Small low-car-usage household"

In [52]:

# Second principal component vector
PC_contr <- eigenvectors[,c("PC2")]
# We order by the magnitude of the contribution
# We use the - sign because we want a descending order
ord <- order( -abs(PC_contr) )
PC_contr <- PC_contr[ord]
# We just select the 5 highest contributing dimensions
PC_contr <- PC_contr[1:5]
options(repr.plot.width=12, repr.plot.height=8)
barplot(PC_contr, main="Contribution to the 2nd component",xlab="Original Dimensions")

Based on the value of the important loadings, the second component can be described as "Bike Usage" or "Bike unavailability"

In [43]:

scores <- hpca_cor$x 
par(mfrow = c(1,2))
options(repr.plot.width=16, repr.plot.height=8)
plot(scores[,1], hpdata$num_of_pickups, xlab="Small low-car-usage household propensity", 
     ylab="num_of_pickups",cex=1.2,cex.lab=1.2)#,xlim=c(-1000,0))
plot(scores[,2], hpdata$num_of_pickups, xlab="Bike unavailability", 
     ylab="num_of_pickups",cex=1.2,cex.lab=1.2)

We show with plots and comments if the variable num of bikes can be explained by the first two principal components. We visualize in the same plot the first 2 components and some indication of the number of pickups.

In [64]:

options(repr.plot.width=15, repr.plot.height=10)
df <- data.frame(SmallHH=scores[,1], Unavailability=scores[,2],
                Num.pickups=hpdata$num_of_pickups)
#quantile(df$pickups, 0.25)
df$Pickups.yes <- cut(df$Num.pickups, c(-Inf, 0, Inf)) # create category for pickups (yes) or no pickups
#df <- df[which(hpdata$hour>=8 & hpdata$hour<=10),]
ggplot(df, aes(x=SmallHH, y=Unavailability) ) + geom_point( aes(size=Num.pickups, color=Pickups.yes),alpha=0.75 ) + theme_gray(base_size = 25)

Comment on your observations.