> df=read.csv("yacht_hydrodynamics.csv") > summary(df) B.D: min 2.73->2.810 > lin1=lm(RR~.,data=df) > summary(lin1) > AIC(lin1) [1] 2233.705 sig: Fr Residual standard error: 8.96 on 301 degrees of freedom Adjusted R-squared: 0.6507 > sqrt(sum(lin1$residuals^2)/301) [1] 8.959608 > log1=lm(log(RR)~.,data=df) > summary(log1) > AIC(log1) [1] 172.3199 sig: Intercept & log(Fr) Residual standard error: 0.3155 on 301 degrees of freedom Adjusted R-squared: 0.9709 > sqrt(sum((exp(log1$fitted.values)-df$RR)^2)/301) [1] 2.164252 !much lower than lin1 > sqrt(sum((log(lin1$fitted.values/df$RR)^2))/301) [1] NaN lin1$fitted.values<0...can't take log > log2=lm(log(RR)~log(Fr),data=df) > summary(log2) > AIC(log2) [1] 195.2754 sig: Intercept & log(Fr) Residual standard error: 0.3301 on 306 degrees of freedom Adjusted R-squared: 0.9681 > plot(log(df$Fr),log(df$RR)) > abline(a=7.25812,b=4.69046) can see line is OK but data is slightly curved > plot(df$Fr,df$RR) > curve(exp(7.25812+4.69046*log(x)),.125,.45,add=T) failure at large Fr seems larger in linear-linear plot > log3=lm(log(RR)~poly(log(Fr),2)+.-Fr,data=df) > summary(log3) > AIC(log3) [1] 116.4574 sig:adds CoB Residual standard error: 0.2877 on 300 degrees of freedom Adjusted R-squared: 0.9758 everything improved over log1 & log2, particularly AIC > log4=lm(log(RR)~poly(log(Fr),3)+.-Fr,data=df) > summary(log4) > AIC(log4) [1] -6.010674 sig: nothing new Residual standard error: 0.2354 on 299 degrees of freedom Adjusted R-squared: 0.9838 everything improved over log3, particularly AIC, now negative > log5=lm(log(RR)~poly(log(Fr),3)+CoB,data=df) > summary(log5) > AIC(log5) [1] 8.637524 sig: added (Intercept) Residual standard error: 0.2426 on 303 degrees of freedom Adjusted R-squared: 0.9828 everthing worse, but not by much > log6=lm(log(RR)~poly(log(Fr),3)+CoB+L.D,data=df) > summary(log6) > AIC(log6) [1] 6.82053 sig: added L.D 0.0532 . Residual standard error: 0.2415 on 302 degrees of freedom Adjusted R-squared: 0.9829 marginally better than previous > log7=lm(log(RR)~poly(log(Fr),3)+CoB+L.D+L.B,data=df) > summary(log7) > AIC(log7) sig: L.D & L.B both strong [1] -6.229965 Residual standard error: 0.2361 on 301 degrees of freedom Adjusted R-squared: 0.9837 close to all-in case, much improved AIC > log8=lm(log(RR)~poly(log(Fr),3)+CoB+L.D+L.B+P.C,data=df) > summary(log8) > AIC(log8) [1] -7.795269 sig: added P.C. Residual standard error: 0.2351 on 300 degrees of freedom Adjusted R-squared: 0.9838 best AIC, otherwise much like all in > log9=lm(log(RR)~poly(log(Fr),3)+CoB+L.D+L.B+B.D,data=df) > summary(log9) > AIC(log9) [1] -7.715569 sig: added B.D (P.C not an option) very similar to log8 Residual standard error: 0.2352 on 300 degrees of freedom Multiple R-squared: 0.9842, Adjusted R-squared: 0.9838 > plot(df$Fr,df$RR-exp(log8$fitted.values)) > pdf("log8-residuals-lin.pdf") > plot(df$Fr,df$RR-exp(log8$fitted.values)) > dev.off() X11cairo 2 > plot(df$Fr,log8$residuals) > pdf("log8-residuals-log.pdf") > plot(df$Fr,log8$residuals) > dev.off() X11cairo 2 > sqrt(sum((exp(log8$fitted.values)-df$RR)^2)/299) [1] 2.09336 only slight improvement over log1 > lin2=lm(RR ~ poly(Fr,4) + CoB*Fr + P.C*Fr, data=df) > summary(lin2) > AIC(lin2) [1] 1047.861 sig:all Residual standard error: 1.303 on 299 degrees of freedom Adjusted R-squared: 0.9926 huge improvement over lin1, but is it better than log8? > plot(df$Fr,lin2$residuals) > pdf("lin2-residuals-lin.pdf") > plot(df$Fr,log8$residuals) > dev.off() X11cairo 2 lin2 has lower overall residuals, log8 does better a low Fr if you look at abs of residuals, they are much closer > mean(abs(exp(log8$fitted.values)-df$RR)) [1] 0.9169838 > mean(abs(lin2$fitted.values-df$RR)) [1] 0.8250252 lin3=lm(RR ~ poly(Fr,6)+P.C*Fr+L.D*Fr+L.B*Fr+I(P.C*Fr^6)+I(L.B*Fr^6)+I(L.D*Fr^6)+I(CoB*Fr^6), data=df) > AIC(lin3) [1] 837.1401 Residual standard error: 0.9195 on 291 degrees of freedom Adjusted R-squared: 0.9963 > mean(abs(lin3$fitted.values-df$RR)) [1] 0.5439024