Appendix C — Case Study: FitBit

In this appendix, we will use a publicly available dataset from our research group on fitbit at ORU, analyzing it without referencing the paper published by our group in 2019.

C.1 Context of the Data

This dataset is from the most cited article by our group, referenced 21 times as of June 19, 2024.1

Reference: Broaddus, Allie; Jaquis, Brandon; Jones, Colt; Jost, Scarlet; Lang, Andrew; Li, Ailin; et al. (2018). Dataset: Fitbits, field-tests, and grades. The effects of a healthy and physically active lifestyle on the academic performance of first-year college students. figshare. Dataset. https://doi.org/10.6084/m9.figshare.7218497.v1

C.1.1 Data Collection

The data were collected during the Fall semester of 2017 from 581 freshmen enrolled at Oral Roberts University in a class titled “Introduction to Whole Person Education.” This course includes a mandatory health and physical exercise component, consisting of:

  • Steps and Active Minutes goals
  • A 1-mile field test
  • A lifestyle assessment survey

Students utilized Fitbits to track their steps and active minutes, which were synced with the course gradebook. The students’ semester grade point averages (GPAs) were recorded at the end of the semester. To ensure confidentiality, the dataset was de-identified after the grades were retrieved and stored.

C.2 Loading the dataset

df <- read.csv("FitbitsAndGradesData_Fall2017.csv")
str(df) # quick overview of the data
'data.frame':   581 obs. of  11 variables:
 $ Key       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Steps     : int  11157 7986 11602 10609 14552 8770 7138 14676 10963 13510 ...
 $ Peak      : num  0.13 2.72 0.35 1 9.1 0.57 0.13 0.55 3.76 0.17 ...
 $ Cardio    : num  3.86 15.53 2.1 6.51 6.09 ...
 $ FatBurn   : num  112 249 195 122 110 ...
 $ Mode      : int  1 1 0 1 1 0 1 1 1 1 ...
 $ Minutes   : num  9.35 8.2 12.73 10 8.63 ...
 $ Gender    : int  1 1 0 1 0 1 1 1 1 1 ...
 $ Age       : int  23 18 18 17 18 18 18 19 18 18 ...
 $ GPA       : num  4 3.26 3.07 4 2.87 2.76 2.86 3.76 3.47 3.82 ...
 $ Life.Score: int  48 101 64 41 83 45 74 67 61 51 ...

C.3 Summary per gender

C.3.1 Males

df_m <- df[df$Gender==0,]
dim(df_m)
[1] 237  11

Coding:2 If Gender = 0, it indicates a male.

dim(df_m)
[1] 237  11
summary(df_m)
      Key            Steps            Peak            Cardio       
 Min.   :  3.0   Min.   :    9   Min.   :  0.00   Min.   :  0.000  
 1st Qu.:138.0   1st Qu.: 8399   1st Qu.:  0.20   1st Qu.:  1.830  
 Median :282.0   Median :10455   Median :  0.58   Median :  3.340  
 Mean   :282.5   Mean   :10425   Mean   :  4.54   Mean   :  8.308  
 3rd Qu.:425.0   3rd Qu.:12497   3rd Qu.:  1.54   3rd Qu.:  6.570  
 Max.   :581.0   Max.   :20331   Max.   :267.10   Max.   :134.200  
    FatBurn            Mode           Minutes           Gender       Age       
 Min.   :  0.01   Min.   :0.0000   Min.   : 5.380   Min.   :0   Min.   :16.00  
 1st Qu.: 69.84   1st Qu.:1.0000   1st Qu.: 6.850   1st Qu.:0   1st Qu.:18.00  
 Median :104.09   Median :1.0000   Median : 7.630   Median :0   Median :18.00  
 Mean   :122.16   Mean   :0.9114   Mean   : 8.586   Mean   :0   Mean   :18.85  
 3rd Qu.:169.65   3rd Qu.:1.0000   3rd Qu.: 9.120   3rd Qu.:0   3rd Qu.:19.00  
 Max.   :439.58   Max.   :1.0000   Max.   :21.070   Max.   :0   Max.   :40.00  
      GPA          Life.Score    
 Min.   :0.600   Min.   : 35.00  
 1st Qu.:2.680   1st Qu.: 56.00  
 Median :3.210   Median : 68.00  
 Mean   :3.054   Mean   : 69.98  
 3rd Qu.:3.620   3rd Qu.: 83.00  
 Max.   :4.000   Max.   :124.00  

C.3.2 Females

Coding3: if Gender = 1 => female

df_f <- df[df$Gender==1,]
dim(df_f)
[1] 344  11

dim(df_f)
[1] 344  11
summary(df_f)
      Key            Steps            Peak            Cardio       
 Min.   :  1.0   Min.   :    0   Min.   : 0.000   Min.   :  0.000  
 1st Qu.:156.8   1st Qu.: 8771   1st Qu.: 0.385   1st Qu.:  3.098  
 Median :305.5   Median :10187   Median : 0.945   Median :  5.780  
 Mean   :296.8   Mean   :10094   Mean   : 1.708   Mean   :  9.490  
 3rd Qu.:444.2   3rd Qu.:11790   3rd Qu.: 2.152   3rd Qu.:  9.852  
 Max.   :580.0   Max.   :18673   Max.   :36.180   Max.   :180.650  
    FatBurn            Mode           Minutes          Gender       Age       
 Min.   :  0.00   Min.   :0.0000   Min.   : 5.58   Min.   :1   Min.   :16.00  
 1st Qu.: 98.27   1st Qu.:1.0000   1st Qu.: 8.91   1st Qu.:1   1st Qu.:18.00  
 Median :136.62   Median :1.0000   Median :10.79   Median :1   Median :18.00  
 Mean   :167.03   Mean   :0.8169   Mean   :11.25   Mean   :1   Mean   :18.46  
 3rd Qu.:214.15   3rd Qu.:1.0000   3rd Qu.:13.21   3rd Qu.:1   3rd Qu.:19.00  
 Max.   :729.67   Max.   :1.0000   Max.   :19.10   Max.   :1   Max.   :25.00  
      GPA          Life.Score    
 Min.   :0.000   Min.   : 35.00  
 1st Qu.:2.920   1st Qu.: 58.00  
 Median :3.500   Median : 70.00  
 Mean   :3.278   Mean   : 71.91  
 3rd Qu.:3.820   3rd Qu.: 83.00  
 Max.   :4.000   Max.   :130.00  

C.4 Analysis from summary of statistics

C.4.1 Key Observations:

  1. Steps:
    • Males show a slightly higher mean and median steps compared to females, indicating a higher average step count. However, the broader range for males suggests greater variability and possibly the presence of subgroups within the male population.
  2. Peak and Cardio:
    • Males have a higher mean peak value and a much larger maximum value, indicating more intense peaks in activity levels. Cardio values are relatively similar in means, but females have a higher maximum cardio value.
  3. FatBurn:
    • Females have a higher mean and maximum fat burn values, suggesting that on average, they spend more time in the fat burn zone compared to males.
  4. Minutes:
    • The Minutes variable contains the recorded times for a required field test. Females have higher mean and median minutes, suggesting they need more time to complete the field test.
  5. Age:
    • Both groups have a similar age range, but males have a slightly higher mean age.
  6. GPA:
    • Females have a higher mean and median GPA, indicating better academic performance on average compared to males.
  7. Life.Score:
    • Females have a higher mean life score, suggesting a lower overall quality of life or well-being compared to males.

C.4.2 Conclusion:

The summary and visual statistics reveal that males tend to have more variability in their physical activity levels, with a wider range of steps and peaks, potentially indicating the presence of distinct subgroups within the male population. Females, on the other hand, exhibit more consistent activity levels with higher fat burn and overall minutes spent in activity. Academically, females outperform males on average, as indicated by higher GPA scores. Both groups show a similar range in age and life scores, with females having slightly higher life scores on average.

C.5 Distribution of Steps per gender

The density plot displays the distribution of steps taken by two groups based on gender. Several key observations can be made from the plot:

  1. Broader Distribution for Males (0):
    • The distribution for males (blue curve) is broader, indicating greater variability in the number of steps taken. This suggests that there might be at least two subgroups within the male participants. One subgroup could be more active, averaging a higher number of steps per week, while another subgroup could be less active.
  2. Uniform Distribution for Females (1):
    • The distribution for females (pink curve) is narrower and more symmetric compared to the males. This indicates that the number of steps taken by female participants is more consistent across the group. The symmetry suggests a more homogeneous activity level among females.
  3. Shoulder in Female Distribution:
    • A noticeable shoulder on the right side of the female distribution curve indicates a small subset of females who are more active than the average. This subgroup, while not as large as the main peak, suggests that there are some females who take a significantly higher number of steps.

C.5.1 Analysis from density distribution visualization:

The density plot highlights distinct differences in the activity patterns between males and females. Males exhibit a wider range of activity levels, possibly due to the presence of distinct subgroups with different activity levels. In contrast, females show a more uniform activity pattern, though with a small subset displaying higher activity levels. Understanding these differences could be valuable for tailoring health and fitness programs to better meet the needs of each gender group.

C.6 Analysis of GPA vs. Steps

To investigate the impact of physical activity on overall academic performance, we performed a linear regression analysis using the number of steps as the predictor variable and GPA as the response variable. Below is the summary of the regression model and the corresponding scatter plot with the regression line.

C.6.1 Scatter Plot with Regression Line

The scatter plot shows the relationship between the number of steps taken and the GPA of the students. A regression line has been added to illustrate the trend.

C.6.2 Linear Regression Model Summary

model1 <- lm(data=df, df$GPA ~ df$Steps)
summary(model1)

Residual standard error: 0.6914 on 579 degrees of freedom
Multiple R-squared: 0.1689
Adjusted R-squared: 0.1675
F-statistic: 117.7 on 1 and 579 DF
p-value: < 2.2e-16

C.6.3 Interpretation of Results

  1. Intercept (Estimate = 2.148): The intercept represents the average GPA when the number of steps is zero. This value is statistically significant (p < 2e-16).

  2. Slope (Estimate = 0.0001015): The slope indicates that for every additional 10,000 steps, the GPA increases by approximately 1.015. This relationship is statistically significant (p < 2e-16).

  3. R-squared (0.1689): This value indicates that approximately 16.89% of the variability in GPA can be explained by the number of steps taken. While this suggests a positive correlation, it also indicates that there are other factors influencing GPA that are not accounted for in this model.

C.6.4 Conclusion

The regression analysis suggests a statistically significant positive relationship between physical activity (measured by steps) and academic performance (GPA). However, the relatively low R-squared value indicates that other factors also play a significant role in determining GPA. Further research could include additional variables to build a more comprehensive model.

C.7 Linear Regression Analysis by Gender

C.8 Modelling with all variables

model_all <- lm(data=df,GPA~.)
summary(model_all)

Call:
lm(formula = GPA ~ ., data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0274 -0.3591  0.1385  0.4772  1.4102 

Coefficients: (1 not defined because of singularities)
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      2.318e+00  4.170e-01   5.557 4.21e-08 ***
Key             -5.512e-05  1.687e-04  -0.327   0.7439    
Steps            9.312e-05  1.016e-05   9.166  < 2e-16 ***
Peak            -1.919e-03  2.135e-03  -0.899   0.3690    
Cardio           1.923e-03  2.066e-03   0.931   0.3523    
FatBurn          1.519e-04  3.639e-04   0.417   0.6765    
Mode             4.535e-02  1.101e-01   0.412   0.6807    
Minutes         -2.668e-02  1.425e-02  -1.873   0.0616 .  
Gender           3.250e-01  6.586e-02   4.934 1.06e-06 ***
Age              7.757e-03  1.702e-02   0.456   0.6486    
Life.Score      -2.899e-03  1.586e-03  -1.828   0.0681 .  
GenderLabelMale         NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6758 on 570 degrees of freedom
Multiple R-squared:  0.2183,    Adjusted R-squared:  0.2046 
F-statistic: 15.92 on 10 and 570 DF,  p-value: < 2.2e-16

C.9 Analysis

C.9.1 Main Results

The linear regression analysis was performed to predict GPA based on various predictor variables. Here are the key findings from the analysis:

  • Intercept: The intercept of the model is 2.318, which is statistically significant (p < 0.001).
  • Steps: The number of Steps taken has a positive and highly significant effect on GPA (estimate = 0.00009312, p < 0.001).
  • Peak: The coefficient for Peak is -0.001919, but it is not statistically significant (p = 0.3690).
  • Cardio: The Cardio variable has a positive but not statistically significant effect on GPA (estimate = 0.001923, p = 0.3523).
  • FatBurn: The coefficient for FatBurn is 0.0001519, but it is not statistically significant (p = 0.6765).
  • Mode: The effect of Mode on GPA is positive but not statistically significant (estimate = 0.04535, p = 0.6807).
  • Minutes: The Minutes variable has a negative effect on GPA, with a marginal level of significance (estimate = -0.02668, p = 0.0616).
  • Gender: Being male (Gender = 0) has a positive and highly significant effect on GPA (estimate = 0.3250, p < 0.001).
  • Age: The effect of Age on GPA is positive but not statistically significant (estimate = 0.007757, p = 0.6486).
  • Life.Score: The Life.Score has a negative effect on GPA with a marginal level of significance (estimate = -0.002899, p = 0.0681).

C.9.2 Model Summary

  • Residual standard error: 0.6758 on 570 degrees of freedom.
  • Multiple R-squared: 0.2183, indicating that approximately 21.83% of the variability in GPA is explained by the model.
  • Adjusted R-squared: 0.2046, which adjusts the R-squared value based on the number of predictors in the model.
  • F-statistic: 15.92 on 10 and 570 DF, with a p-value < 2.2e-16, indicating that the overall model is statistically significant.

C.9.3 Conclusion

The linear regression analysis shows that Steps and Gender (with males having higher GPAs) are significant predictors of GPA. Other variables such as Minutes and Life.Score show marginal significance, while the rest of the variables do not significantly contribute to the model. Despite the statistical significance of some predictors, the model explains only a modest proportion of the variability in GPA (approximately 21.83%), suggesting that other factors not included in the model may also play a substantial role in determining GPA.

C.10 Controlling by gender

C.10.1 Searching for best model for males

model_m <- lm(GPA~.,data=df_m)
summary(model_m)

Call:
lm(formula = GPA ~ ., data = df_m)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.86289 -0.35140  0.09225  0.43291  1.47888 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.714e+00  5.792e-01   6.412 8.22e-10 ***
Key          2.323e-04  2.529e-04   0.918  0.35936    
Steps        7.309e-05  1.334e-05   5.479 1.13e-07 ***
Peak        -2.402e-03  2.138e-03  -1.123  0.26247    
Cardio       1.076e-03  3.294e-03   0.327  0.74415    
FatBurn      1.377e-03  6.822e-04   2.018  0.04477 *  
Mode        -6.378e-01  2.433e-01  -2.622  0.00934 ** 
Minutes     -9.250e-02  2.446e-02  -3.781  0.00020 ***
Gender              NA         NA      NA       NA    
Age         -3.577e-03  1.833e-02  -0.195  0.84542    
Life.Score  -3.010e-03  2.356e-03  -1.278  0.20269    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6342 on 227 degrees of freedom
Multiple R-squared:  0.2542,    Adjusted R-squared:  0.2246 
F-statistic: 8.597 on 9 and 227 DF,  p-value: 4.536e-11
library(leaps)
a <- regsubsets(GPA~.,data=df_m)
Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
force.in, : 1 linear dependencies found
Reordering variables and trying again:
summary_a <- summary(a, all.best=TRUE, matrix=TRUE, rss=TRUE)
par(mfrow=c(1,2))
plot(a,scale="bic",main = "minimizing BIC")
plot(a,scale="adjr2",main = "maximizing adj-R^2")

C.10.2 BIC or adj-R^2 ?

print("minimazing BIC:")
[1] "minimazing BIC:"
bic_values <- summary_a$bic
best_model_index_bic <- which.min(bic_values)
best_model_vars_bic <- names(which(summary_a$which[best_model_index_bic,]))
eq_bic <- paste(best_model_vars_bic, collapse = "+")
print(paste("modelBIC equation: ",eq_bic))
[1] "modelBIC equation:  (Intercept)+Steps+FatBurn+Mode+Minutes"
print("maximizing adjr2:")
[1] "maximizing adjr2:"
adjr2_values <- summary_a$adjr2
best_model_index_adjr2 <- which.max(adjr2_values)
best_model_vars_adjr2 <- names(which(summary_a$which[best_model_index_adjr2,]))
eq_adjr2 <- paste(best_model_vars_adjr2, collapse = "+")
print(paste("modeladjr2 equation: ",eq_adjr2))
[1] "modeladjr2 equation:  (Intercept)+Steps+Peak+FatBurn+Mode+Minutes+Life.Score"

C.10.2.1 Model 1: min(BIC) for only males

model_m_bic <- lm(GPA~Steps+FatBurn+Mode+Minutes,data=df_m)
summary(model_m_bic)

Call:
lm(formula = GPA ~ Steps + FatBurn + Mode + Minutes, data = df_m)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.7785 -0.3533  0.1072  0.4191  1.4927 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.555e+00  4.190e-01   8.483 2.60e-15 ***
Steps        7.374e-05  1.307e-05   5.642 4.88e-08 ***
FatBurn      1.429e-03  5.882e-04   2.430  0.01586 *  
Mode        -6.639e-01  2.342e-01  -2.835  0.00499 ** 
Minutes     -9.767e-02  2.357e-02  -4.144 4.79e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6323 on 232 degrees of freedom
Multiple R-squared:  0.2422,    Adjusted R-squared:  0.2291 
F-statistic: 18.53 on 4 and 232 DF,  p-value: 3.122e-13

C.10.2.2 Model 2: max(adjr2) for only males


Call:
lm(formula = GPA ~ Steps + Peak + FatBurn + Mode + Minutes + 
    Life.Score, data = df_m)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8352 -0.3749  0.1170  0.4251  1.4709 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.698e+00  4.279e-01   8.641 9.53e-16 ***
Steps        7.245e-05  1.310e-05   5.529 8.72e-08 ***
Peak        -2.165e-03  1.900e-03  -1.140 0.255629    
FatBurn      1.496e-03  5.892e-04   2.539 0.011782 *  
Mode        -6.178e-01  2.394e-01  -2.581 0.010486 *  
Minutes     -9.324e-02  2.419e-02  -3.854 0.000151 ***
Life.Score  -2.975e-03  2.337e-03  -1.273 0.204263    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6314 on 230 degrees of freedom
Multiple R-squared:  0.251, Adjusted R-squared:  0.2314 
F-statistic: 12.84 on 6 and 230 DF,  p-value: 1.663e-12

C.10.3 Best Model Selection Rationale

When selecting the best model, we consider various statistical metrics and criteria, such as the adjusted R-squared, the p-value of the F-statistic, and the Bayesian Information Criterion (BIC). Let’s compare the two models in detail:

C.10.3.1 Model 1:

Formula: GPA ~ Steps + FatBurn + Mode + Minutes

  • Residual standard error: 0.6323 on 232 degrees of freedom
  • Multiple R-squared: 0.2422
  • Adjusted R-squared: 0.2291
  • F-statistic: 18.53 on 4 and 232 DF
  • p-value of F-statistic: 3.122e-13
  • Significant predictors:
    • Steps (p-value: 4.88e-08 ***)
    • FatBurn (p-value: 0.01586 *)
    • Mode (p-value: 0.00499 **)
    • Minutes (p-value: 4.79e-05 ***)

C.10.3.2 Model 2:

Formula: GPA ~ Steps + Peak + FatBurn + Mode + Minutes + Life.Score

  • Residual standard error: 0.6314 on 230 degrees of freedom
  • Multiple R-squared: 0.251
  • Adjusted R-squared: 0.2314
  • F-statistic: 12.84 on 6 and 230 DF
  • p-value of F-statistic: 1.663e-12
  • Significant predictors:
    • Steps (p-value: 8.72e-08 ***)
    • FatBurn (p-value: 0.011782 *)
    • Mode (p-value: 0.010486 *)
    • Minutes (p-value: 0.000151 ***)
    • Peak and Life.Score were not significant.

C.10.4 Model Comparison and Selection

  1. Adjusted R-squared: Model 2 has a slightly higher adjusted R-squared (0.2314) compared to Model 1 (0.2291), indicating a slightly better fit.

  2. p-value of F-statistic: Model 1 has a slightly lower p-value (3.122e-13) compared to Model 2 (1.663e-12), suggesting that Model 1 has a marginally better overall significance.

  3. Residual Standard Error: Model 2 has a slightly lower residual standard error (0.6314) compared to Model 1 (0.6323), indicating that Model 2’s predictions are marginally closer to the actual values.

  4. Significant Predictors: Both models have Steps, FatBurn, Mode, and Minutes as significant predictors. However, Model 2 introduces Peak and Life.Score, which are not significant.

C.10.5 Conclusion

Despite Model 2 having a slightly higher adjusted R-squared, the introduction of non-significant predictors (Peak and Life.Score) does not substantially improve the model. Given that both models are quite similar in performance, I choose Model 1 for its simplicity and slightly better p-value of the F-statistic.

Chosen Model: GPA ~ Steps + FatBurn + Mode + Minutes

Reason: Model 1 is chosen because it has a comparable adjusted R-squared, a lower p-value of the F-statistic, and it avoids the inclusion of non-significant predictors, maintaining model simplicity and interpretability.

C.10.6 Searching for best model for females

model_f <- lm(GPA~.,data=df_f)
summary(model_f)

Call:
lm(formula = GPA ~ ., data = df_f)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8958 -0.3379  0.1617  0.5011  1.4613 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.991e+00  7.948e-01   2.504   0.0127 *  
Key         -2.590e-04  2.239e-04  -1.157   0.2482    
Steps        1.126e-04  1.538e-05   7.319 1.87e-12 ***
Peak         3.035e-02  1.671e-02   1.816   0.0703 .  
Cardio      -1.168e-03  3.110e-03  -0.376   0.7074    
FatBurn     -2.412e-04  4.386e-04  -0.550   0.5828    
Mode         1.779e-01  1.260e-01   1.411   0.1591    
Minutes      5.754e-03  1.930e-02   0.298   0.7658    
Gender              NA         NA      NA       NA    
Age          1.056e-02  3.802e-02   0.278   0.7813    
Life.Score  -2.467e-03  2.136e-03  -1.155   0.2489    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6922 on 334 degrees of freedom
Multiple R-squared:  0.2139,    Adjusted R-squared:  0.1927 
F-statistic:  10.1 on 9 and 334 DF,  p-value: 9.319e-14
library(leaps)
a <- regsubsets(GPA~.,data=df_f)
Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
force.in, : 1 linear dependencies found
Reordering variables and trying again:
summary_a <- summary(a, all.best=TRUE, matrix=TRUE, rss=TRUE)
par(mfrow=c(1,2))
plot(a,scale="bic",main = "minimizing BIC")
plot(a,scale="adjr2",main = "maximizing adj-R^2")

C.10.7 BIC or adj-R^2 ?

print("minimazing BIC:")
[1] "minimazing BIC:"
bic_values <- summary_a$bic
best_model_index_bic <- which.min(bic_values)
best_model_vars_bic <- names(which(summary_a$which[best_model_index_bic,]))
eq_bic <- paste(best_model_vars_bic, collapse = "+")
print(paste("modelBIC equation: ",eq_bic))
[1] "modelBIC equation:  (Intercept)+Steps"
print("maximizing adjr2:")
[1] "maximizing adjr2:"
adjr2_values <- summary_a$adjr2
best_model_index_adjr2 <- which.max(adjr2_values)
best_model_vars_adjr2 <- names(which(summary_a$which[best_model_index_adjr2,]))
eq_adjr2 <- paste(best_model_vars_adjr2, collapse = "+")
print(paste("modeladjr2 equation: ",eq_adjr2))
[1] "modeladjr2 equation:  (Intercept)+Key+Steps+Peak+Mode+Life.Score"

C.10.7.1 Model 3: min(BIC) for only females

model_f_bic <- lm(GPA~Steps,data=df_f)
summary(model_f_bic)

Call:
lm(formula = GPA ~ Steps, data = df_f)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9148 -0.3437  0.1579  0.4991  1.4046 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.093e+00  1.389e-01  15.075   <2e-16 ***
Steps       1.174e-04  1.325e-05   8.862   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6958 on 342 degrees of freedom
Multiple R-squared:  0.1868,    Adjusted R-squared:  0.1844 
F-statistic: 78.54 on 1 and 342 DF,  p-value: < 2.2e-16

C.10.7.2 Model 4: max(adjr2) for only females


Call:
lm(formula = GPA ~ Steps + Peak + Mode + Life.Score, data = df_f)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9738 -0.3251  0.1666  0.4980  1.4064 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.194e+00  2.149e-01  10.210  < 2e-16 ***
Steps        1.093e-04  1.342e-05   8.143 7.47e-15 ***
Peak         2.413e-02  1.308e-02   1.845   0.0659 .  
Mode         1.680e-01  9.798e-02   1.715   0.0872 .  
Life.Score  -2.745e-03  2.010e-03  -1.366   0.1730    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6893 on 339 degrees of freedom
Multiple R-squared:  0.2088,    Adjusted R-squared:  0.1994 
F-statistic: 22.36 on 4 and 339 DF,  p-value: < 2.2e-16

C.10.8 Best Model Rationale for Females

C.10.8.1 Model 3: Minimizing BIC

Formula: GPA ~ Steps

  • Residual standard error: 0.6958 on 342 degrees of freedom
  • Multiple R-squared: 0.1868
  • Adjusted R-squared: 0.1844
  • F-statistic: 78.54 on 1 and 342 DF
  • p-value of F-statistic: < 2.2e-16
  • Significant predictors:
    • Steps (p-value: <2e-16 ***)

C.10.8.2 Model 4: Maximizing Adjusted R-squared

Formula: GPA ~ Steps + Peak + Mode + Life.Score

  • Residual standard error: 0.6893 on 339 degrees of freedom
  • Multiple R-squared: 0.2088
  • Adjusted R-squared: 0.1994
  • F-statistic: 22.36 on 4 and 339 DF
  • p-value of F-statistic: < 2.2e-16
  • Significant predictors:
    • Steps (p-value: 7.47e-15 ***)
    • Peak (p-value: 0.0659 .)
    • Mode (p-value: 0.0872 .)
    • Life.Score (p-value: 0.1730)

C.10.9 Model Comparison and Selection for Females

  1. Adjusted R-squared: Model 4 has a higher adjusted R-squared (0.1994) compared to Model 3 (0.1844), indicating that Model 4 explains more variability in GPA for females.

  2. p-value of F-statistic: Both models have highly significant p-values, suggesting that both models are statistically significant overall.

  3. Residual Standard Error: Model 4 has a slightly lower residual standard error (0.6893) compared to Model 3 (0.6958), indicating that Model 4’s predictions are marginally closer to the actual values.

  4. Significant Predictors:

    • In Model 3, Steps is a highly significant predictor.
    • In Model 4, Steps remains highly significant, while Peak and Mode show marginal significance, and Life.Score is not significant.

C.10.10 Conclusion

While Model 4 has a higher adjusted R-squared and a slightly lower residual standard error, it includes predictors (Peak, Mode, and Life.Score) that are not strongly significant. Given the trade-off between model complexity and statistical significance of predictors, Model 3 is preferred for its simplicity and the high significance of the Steps predictor.

Chosen Model for Females: GPA ~ Steps

Reason: Model 3 is chosen because it is simpler and includes only the highly significant predictor, Steps. This simplicity and significance make it a more interpretable and reliable model for predicting GPA in females.

C.11 DrV’s Recommendations

C.11.1 Explanation and Recommendations:

  1. Males:
    • Best Model: GPA ~ Steps + FatBurn + Mode + Minutes
    • Interpretation: This model indicates that GPA is influenced by a combination of steps taken, fat burned, the mode of activity (running or walking), and the minutes spent to complete the field test.
    • Recommendation: Males should engage in higher intensity activities to improve GPA. Suggested activities include:
      • Running or brisk walking: To increase both the number of steps and the amount of fat burned.
      • Structured exercise routines: Spending more time on these activities to maximize benefits.
  2. Females:
    • Best Model: GPA ~ Steps
    • Interpretation: This model shows that GPA is primarily influenced by the number of steps taken.
    • Recommendation: Females can achieve positive results with low-intensity activities. Suggested activities include:
      • Daily walking: A simple and effective way to increase the number of steps and enhance GPA.

C.11.2 Summary:

  • For Males: A more intensive exercise routine, including activities like running or other cardiovascular exercises, is recommended. These activities help in burning more fat and increasing the time spent exercising, which positively impacts GPA.
  • For Females: Low-intensity activities like daily walking are sufficient and beneficial for improving GPA. This approach leverages the simplicity and effectiveness of walking to achieve positive academic outcomes.

These recommendations align with physiological differences between males and females, suggesting that males may benefit more from higher intensity activities, while females can achieve positive outcomes with moderate, consistent physical activity.


  1. retrieved: 06/19/2024↩︎

  2. In later papers from our group, we follow the standard coding where 0 represents females and 1 represents males.↩︎

  3. In later papers from our group, we follow the standard coding where 0 represents females and 1 represents males.↩︎