⭐ ⚾ Upside Research: Pitch-Tracking Metrics as A Predictor of Future Shoulder & Elbow Injuries in MLB Baseball Pitchers
Upside Research: Pitch-Tracking Metrics as A Predictor of Future Shoulder & Elbow Injuries in MLB Baseball Pitchers. A Machine-Learning and Game-Theory Based Analysis
Authors: Jacob F. Oeding,*y MS , Alexander M. Boos,z BA, Josh R. Kalk,§ MS,
Dane Sorenson,§ MS, F. Martijn Verhooven,§ PhD, Gilbert Moatshe,|| MD, PhD, and Christopher L. Camp,z§{ MD
Investigation performed at the Mayo Clinic, Rochester, Minnesota, USA
Background: Understanding interactions between multiple risk factors for shoulder and elbow injuries in Major League Baseball (MLB) pitchers is important to identify potential avenues by which risk can be reduced while minimizing impact on player performance.
Purpose: To apply a novel game theory–based approach to develop a machine-learning model predictive of next-season shoulder and elbow injuries in MLB pitchers and use this model to understand interdependencies and interaction effects between the most important risk factors.
Study Design: Case-control study; Level of evidence, 3.
Methods: Pitcher demographics, workload measures, pitch-tracking metrics, and injury data between 2017 and 2022 were used to construct a database of MLB pitcher-years, where each item in the database corresponded to a pitcher’s information and met- rics for that year. An extreme gradient boosting machine-learning model was trained to predict next-season shoulder and elbow injuries utilizing Shapley additive explanation values to quantify feature importance as well as interdependencies and interaction effects between predictive variables.
Results: A total of 3808 pitcher-years were included in this analysis; 606 (15.9%) of these involved a shoulder or elbow injury resulting in placement on the MLB injured list.
Of the .65 candidate features (including workload, demographic, and pitch- tracking metrics), the most important contributors to predicting shoulder/elbow injury were increased: pitch velocity (all pitch types), utilization of sliders (SLs), fastball (FB) spin rate, FB horizontal movement, and player age.
The strongest game theory interaction effects were that higher FB velocity did not alter a younger pitcher’s predicted risk of shoulder/elbow injury versus older pitchers, risk of shoulder/elbow injury increased with the number of high-velocity pitches thrown (regardless of pitch type and in an additive fashion), and FB velocity \95 mph (\152.9 kph) demonstrated strong negative interaction effects with higher SL per- centage, suggesting that the overall predicted risk of injury for pitchers throwing a high number of SLs could be attenuated by lower FB velocity.
Conclusion: Pitch-tracking metrics were substantially more predictive of future injury than player demographics and workload metrics. There were many significant game theory interdependencies of injury risk. Notably, the increased risk of injury that was conferred by throwing with a high velocity was even further magnified if the pitchers were also older, threw a high percentage of SLs, and/or threw a greater number of pitches.
Shoulder and elbow injuries in Major League Baseball (MLB) continue to be a substantial source of health, performance, and financial burden for both pitchers and teams, with both the number of injured list (IL) designations and the mean number of days spent on the IL increasing each year.4,7 This trend persists despite substantial effort dedicated toward injury prevention, including pitch count guidelines from USA Baseball and MLB.19,27 In addition to these efforts, there has been an abundance of research centered on identification of individual factors associated with shoulder and elbow injury in pitchers.2,5,9,14,20,26
Factors such as increased elbow valgus torque, peak and mean pitch velocity, increased body weight, younger age, and an increased number of breaking pitches thrown have all been identified as potential predictive factors of subsequent elbow injury in pitchers.1,3,5,11
As the number of shoulder and elbow injuries in MLB pitchers has risen, pitcher performance has also improved, with increases in both mean fastball (FB) velocity and the number of pitches reaching 100 mph (161 kph) or faster each year.25 This is in spite of the findings identifying high peak and mean pitch velocities as primary drivers for shoulder and elbow injury3,5,16; thus, there is substantial tension between injury risk reduction and player performance that must be considered when identifying potential strategies to reduce the risk of shoulder and elbow injuries in MLB pitchers.
While prior studies have identified risk factors for shoulder and elbow injuries, these factors have often been studied in isolation using linear models such as logistic regression that preclude the analysis of complex, multifactorial interactions and interdependencies between potential risk factors. This prohibits identification of potential avenues by which risk can be reduced while minimizing impact on player performance, resulting in a lack of realistic, actionable, and evidence-based guidelines for managing a pitcher’s routine in a way that prioritizes injury reduction while optimizing pitching performance. Although many studies have attempted to follow player performance statistics (eg, earned run average, walks plus hits per inning), these variables have turned out to be volatile and not likely a suitable target for injury-prediction efforts.21
More recently, emerging technologies that track the flight of the baseball have emerged as potential variables that can be measured and followed for professional pitchers. In addition to velocity, these newer pitch-tracking metrics include variables such as spin rate, vertical movement, horizontal movement, spin efficiency, release points, and pitch utilization. Because these variables tend to be less volatile and more under the control of the pitcher compared with traditional statistical metrics, they may be better suited for study of injury risk factors.21
Although these new pitch-tracking metrics are intriguing, injury risk may be related to a number of different objective player demographics, measures of workload, and pitch characteristics that are independently related to one another in complex ways that have not been fully studied using standard statistical methodology.
To better understand interactions between these multiple risk factors and create a tool that can be used to improve our understanding of injury risk, a novel game theory–based approach was taken to develop a machine- learning model capable of predicting shoulder or elbow injuries in MLB pitchers using a multitude of objective measures, including player demographics, measures of workload, and pitch-tracking metrics. We hypothesized that shoulder and elbow injuries could be predicted with high accuracy and that a number of these objective variables would correlate with injury risk.
METHODS
Guidelines
Following expedited review by our institutional review board, this study was deemed exempt from institutional review board approval based on the utilization of publicly available databases. Analyses were performed adherent to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis guidelines and the Guidelines for Developing and Reporting Machine Learning Models in Biomedical Research.6,18
Study Population
This retrospective study utilized ball tracking, player demographic, and player workload data collected for the years 2017 to 2021 and player injury data for the years 2018 to 2022.
Data were obtained from the MLB Stats API (https://statsapi.mlb.com/). All pitchers with a complete set of data (demographic, workload, and ball-tracking) were included in the study, while pitchers missing demographic, workload, and/or ball-tracking data were excluded.
Variables
The primary outcome of interest was placement on the MLB IL for a shoulder or elbow injury the following season.
In addition to this dichotomous outcome, information about the specific joint injured, as well as the diagnosis documented when the player was moved to the IL, was recorded.
Candidate features for the machine-learning algorithm consisted of variables related to pitcher demographics, cumulative workload, and pitch-tracking metrics for each season. Demographic data included age, height, weight, handedness, and country of origin. Workload data consisted of total number of pitches thrown, number of batters faced, total innings pitched, and number of complete games thrown over the course of the season. Pitch-tracking metrics were derived from ball-tracking data and included information related to the percentage of pitch type (FB, curveball [CB], changeup [CH], cutter, sinker, slider [SL], splitter), as well as the velocity, horizontal movement, vertical movement, extension (how far off the mound, in feet, a pitcher released the pitch: the horizontal distance from release point to the front of the pitching rubber), spin rate, and spin axis for every pitch thrown that season, broken down by pitch type. In addition, the percentage of a player’s pitches by pitch type and summative metrics for velocity, extension, spin rate, and spin axis (regardless of pitch type) were derived. Continuous variables were scaled such that all features had values ranging from 0 (corresponding to lowest recorded value) to 1 (corresponding to highest recorded value) before being input into models for feature selection and training. In addition, a label encoder was applied to categorical variables such that each potential categorical value was represented in a binary manner. All variables with missing data were related to pitch-tracking data, and missing values were due to the fact that most pitchers did not throw every pitch type. Missing values were intrinsically imputed during model development, as branch directions for missing val- ues were learned by the model internally during training.
Model Development
Features were input into the modeling workflow to train an extreme gradient boosting (XGBoost) machine-learning model. This ensemble method was selected for its ability to learn complex data structures and nonlinear relationships by creating additive models in incremental stages and optimizing multiple loss functions. This enables modeling of higher-order interactions when working with high- dimensional data sets. In addition, the XGBoost algorithm performs intrinsic feature selection as it learns the model. In other words, XGBoost models will include only predic- tors that help maximize their accuracy, as they are capable of selecting the best representation of the data to optimize performance while being resistant to noninformative predictors. This was important given the large number of potential predictive features in our data set. Models were trained to fit the appropriate hyperparameters and vali- dated on a holdout test set, with 67% of the data used for the training set and 33% reserved for the test set. Fivefold cross-validation was used for model training and internal validation. To summarize, model evaluation consists of random partitions of the complete training data set into train and validation sets for 5 different folds without replacement of the data, and the evaluation metrics are recorded for each repetition and summarized with a stan- dard distributions of values.22
Given the class imbalance within the data set, with the positive class representing 15.9% of the data set, we performed cost-sensitive machine learning utilizing class weights, such that misclassification of a positive class as a negative was more costly compared with misclassification of a negative class as a positive. Classes were weighted relative to their ratio in the data set. Additionally, we attempted subsampling, a data- augmentation technique to increase the availability of data for learning, although this produced only negligible improvements in model performance. Metrics used to evaluate the predictive performance of the model were calculated, including accuracy as well as the area under the receiver operating characteristics curve (AUC).
Understanding Predictions Using Shapley Additive Explanation Values
Model interpretability was enhanced globally utilizing variable importance plots and Shapley additive explanation (SHAP) values. The former demonstrates a ranking of the overall importance of input features on the predictive performance of the model, while the latter is a game theory–based approach to explain machine-learning mod- els, where the input features are treated as players in a cooperative game and the model performance is treated as the payoff of the game.23 In original team-based game theory problems, SHAP values were developed to accu- rately and consistently rank and assign a specific value to each player on a team based on the player’s contribution to the team’s overall outcome, which enabled identification of theoretically optimal solutions to game theory prob- lems.23
Here, SHAP values are used similarly to rank and assign distributions of the overall predicted risk to individual input features. SHAP values can be employed to produce both feature contributions to model predictions globally as well as interpretable plots demonstrating how the input feature values of individual pitchers lead to their predicted label.
Analyzing Interaction Effects With SHAP Values
An important advantage of machine learning when compared with traditional statistics is the ability to analyze nonlinear relationships and interaction effects that can substantially affect the model’s output and, as a result, its predictive power. For example, while the presence of a specific predictive factor may increase one pitcher’s risk for shoulder or elbow injury, interaction effects of this predictive factor with another predictive factor may result in the same factor’s having no effect, or even a protective effect, on another pitcher’s risk for shoulder or elbow injury.
In game theory, this is akin to team dynamics, where by certain combinations of individual players result in additional positive or negative influences beyond what the individual players’ skill sets would alone predict and relationships between players can produce substantial changes in the model’s output. Thus, SHAP interaction values derived from the Shapley interaction index13 were derived to quantify the most important relationships and interaction effects between features in the model.
Statistical Analysis
Continuous variables were assessed with 2-sample t tests, while categorical variables were compared using Fisher exact tests. Statistical significance was defined as P \ .05. All data analysis was performed with Python 3.8.8 (Python Software Foundation).
RESULTS
Study Population Characteristics
A total of 3808 pitcher-years were included in this analysis, of which 606 (15.9%) sustained a shoulder or elbow injury resulting in placement on the IL. A breakdown by joint and diagnosis is provided in Table 1.
A comparison of the candidate features between pitchers who experienced an injury the following season and pitchers who did not is provided in Appendix Table A1. When compared by group, the injured cohort was slightly younger (30.9 vs 31.7 years; P \ .001). There were no other differences based on pitcher demographics, and no significant group differences in variables related to workload (pitches thrown, batters faced, innings pitched, or complete games). With respect to pitch-tracking metrics, the injured cohort threw with higher mean velocity (89.3 vs 88.8 mph [143.6 vs 142.8 kph]; P \ .001) and spin rate (2244 vs 2224 rpm; P = .021). Broken down by individual pitch type, injured players had higher mean velocities for FBs (93.6 vs 93.1 mph [150.7 vs 149.8 kph]), CBs (79.2 vs 78.4 mph [127.5 vs 126.2 kph]), CHs (85.9 vs 85.1 mph [138.3 vs 136.9 kph]), CTs (89.2 vs 88.2 mph [143.6 vs 141.9 kph]), SIs (92.7 vs 92.3 mph [149.2 vs 148.6 kph]), and SLs (84.6 vs 84.0 mph [136.1 vs 135.2 kph]) (P .007 for all).
In addition, the injured cohort demonstrated greater amounts of horizontal movement on FBs (–4.33 vs –3.44 inches [–11 vs –8.7 cm]; P = .015). Injured players threw lower percentages of CTs (4.5% vs 5.8%; P = .016) and SPs (1.1% vs 1.7%; P = .019) than noninjured pitchers. Both FB spin rate (2265 vs 2241 rpm; P = .001) and spin axis (192° vs 190°; P = .047) were higher among injured players.
Preliminary Feature Ranking and Selection
Mean feature importance rankings were determined using 3 separate algorithms: recursive feature elimination with a random forest algorithm, linear model coefficient ranking with ridge regression, and Gini importance (mean decrease in impurity) of a random forest. Selected (estimated best) features are assigned rank 1 while nonimportant features have ranks close to 0. The overall ranking for each feature was then obtained by computing the mean of ranks deter- mined by each algorithm. Of .65 candidate features, the 20 most important contributors to predicting shoulder or elbow injury consisted primarily of metrics related to pitch velocity and pitch use, although FB spin rate and horizontal movement were the second and third most important fac- tors in predicting injury, respectively. Age and total innings pitched over the course of the season were determined to be important predictors as well (Figure 1).
Figure 1. Mean feature importance rankings as determined using 3 separate algorithms: recursive feature elimination with a ran- dom forest algorithm, linear model coefficient ranking with ridge regression, and Gini importance (mean decrease in impurity) of a random forest. Selected (estimated best) features are assigned rank 1, while nonimportant features have ranks close to 0. FB, fastball; CB, curveball; CH, changeup; CT, cutter; SI, sinker; SL, slider; SP, splitter; %, frequency; v, velocity; -X, horizontal move- ment; -Z, vertical movement.
Algorithm Performance and Feature Importance Using SHAP Values
Based on demographic, workload, and pitching metrics from the prior season, pitchers in the test set experiencing a shoulder or elbow injury were able to be predicted with an accuracy of 0.84 (95% CI, 0.83-0.85). An AUC of 0.66 (95% CI, 0.60-0.71) was achieved with the XGBoost model with class-weighting.
Discrimination was reduced slightly when limiting the model to only the top 10 features (Figure 2).
Figure 2. Performance of the XGBoost model created with only the top 10 features on internal validation using cross- validation. The shaded area indicates the standard deviation. The area under the receiver operating characteristic curve (AUC) decreased from 0.66 to 0.61.
The extent and direction of influence of a predictive factor on an individual pitcher’s predicted risk for shoulder or elbow injury is not consistent from one pitcher to the next due to the presence of interaction effects and nonlinear relationships, as discussed previously. This is an advantage of machine learning when compared with traditional statistics. However, combining the calculated SHAP values from each pitcher-specific model in the study population allows determination of the overall most important predictors across all pitchers in the study.
The most important factors for the XGBoost model are shown in the SHAP summary plot depicted in Figure 3. Higher mean velocity, higher FB spin rate, and a higher percentage of SLs thrown, as well as more horizontal and vertical movement, were all important positive predictive factors for shoulder or elbow injury, meaning they positively contributed to the model’s prediction that a pitcher would experience a shoulder or elbow injury the following season.
Interestingly, very few negative predictive factors that decrease the likelihood of the model predicting a shoulder or elbow injury in a pitcher were identified, with only slower CHs, CBs, and FBs, lower FB spin rates, and lower percentages of SLs thrown contributing considerably negative SHAP values to pitchers’ overall risks (Figure 3).
Figure 3. Feature importance (Shapley additive explanation [SHAP] plots) for the XGBoost model showing the relative contribution of each feature to model predictions. Each point represents a single pitcher on which the probability of experi- encing shoulder or elbow injury was predicted. The y-axis lists variables selected into the model in order of importance from top to bottom. SHAP values are listed on the x-axis and cor- respond to the change in log-odds attributed to a feature’s value (and hence the change in likelihood of experiencing injury). A higher absolute value indicates greater importance for generating a prediction. Gradient colors correspond to the original value for a particular feature. Positive SHAP values contribute to increased likelihood of experiencing a shoulder or elbow injury, while negative SHAP values correspond to decreased likelihood. FB, fastball; CB, curveball; CH, changeup; CT, cutter; SI, sinker; SL, slider; SP, splitter; %, fre- quency; v, velocity; -X, horizontal movement; -Z, vertical movement.
Dependency and Interaction Effects
Analysis of SHAP dependence plots revealed strong feature interdependencies among predictive features, with a pitcher’s FB velocity demonstrating the strongest inter- dependencies with other predictive features. Although pitcher age was not among the most important factors for predicting shoulder or elbow injury, it demonstrated strong interaction effects with a pitcher’s FB velocity (Figure 4).
For example, a higher FB velocity did not alter a younger pitcher’s predicted risk of shoulder or elbow injury as substantially as it did for older pitchers.
While younger pitchers’ risk of shoulder or elbow injury did not vary with FB velocity (suggesting younger pitchers could throw with higher velocities without increasing risk for injury), older pitchers throwing with higher velocities demonstrated increased SHAP values (increased contributions of age to overall predicted risk) compared with older pitchers throwing with lower velocities. It is also interesting to note that in general, age and FB velocity were inversely related, with younger pitchers throw- ing substantially faster than older pitchers (Figure 5).
Figure 5. Shapley additive explanation (SHAP) dependence plot showing the effect of age on the impact of fastball veloc- ity. Interaction effects between each variable are displayed as vertical dispersion. Each dot represents a single pitcher in the study. vFB, fastball velocity.
Similarly, FB velocity displayed strong interdependencies with the pitcher’s mean velocity for other pitches (Fig- ure 4). For example, the contributions of a pitcher’s CB and CH velocities on the overall predicted risk of shoulder or elbow injury were substantially higher if that pitcher also threw with a higher FB velocity. In other words, if 2 pitchers had the same mean CB velocity, but 1 had a high FB velocity and the other had a low FB velocity, the contribution of that pitcher’s CB velocity on the model’s output for that pitcher was substantially higher for the pitcher who had a high mean FB velocity when compared with the pitcher with a low mean FB velocity, which suggests a potential protective effect for pitchers who throw with a high CB velocity but low FB velocity, or vice versa.
Figure 4. Shapley additive explanation (SHAP) dependence plots demonstrating how the importance of multiple predictive fea- tures varies depending on a pitcher’s fastball velocity (vFB): (A) age, (B) curveball velocity (vCB), and (C) changeup velocity (vCH). Interaction effects between each variable are displayed as vertical dispersion. Each dot represents a single pitcher in the study.
Further analysis of SHAP interaction values demonstrated strong interaction effects among some of the most important predictors of shoulder or elbow injury (Figure 6).
For example, mean FB velocities exceeding 95 mph (152.9 kph) demonstrated strong, positive interaction effects with a higher percentage of SLs thrown, suggesting that pitchers with both of these predictive factors experienced an even greater predicted risk of shoulder or elbow injury.
However, mean FB velocities \95 mph demonstrated strong, negative interaction effects with higher SL percentages, suggesting that the overall predicted risk of injury for pitchers throwing a high number of SLs could be attenuated by throwing with a lower mean FB velocity. Interestingly, though, for pitchers who did not throw a high percentage of SLs, no interaction effects were observed (Figure 6).
DISCUSSION
In this study, a machine-learning model was able to predict shoulder and elbow injuries in MLB pitchers with excellent accuracy, and the greatest predictors of future injury were increased: pitch velocity (of all pitch types), utilization of SLs, FB spin rate, and FB horizontal movement.
A game theory–based approach was taken to better understand complex interdependencies and interaction effects between identified risk factors, and some of the more impactful relationships included the following: a higher FB velocity did not alter a younger pitcher’s predicted risk of shoulder or elbow injury as substantially as it did for older pitchers; the risk for shoulder or elbow injury increased with the number of high velocity pitches thrown (regardless of pitch type and in an additive fashion); and mean FB velocities \95 mph demonstrated strong, negative interaction effects with higher SL percentages, suggesting that the overall predicted risk of injury for pitchers throwing a high num- ber of SLs could be attenuated by throwing with a lower mean FB velocity.
To accurately predict next-season shoulder and elbow injuries in pitchers, the model developed in this study found that ball-tracking data were far more impactful on injury risk than player demographics or workload measures. The most important predictors included pitch velocity and movement, frequency of SLs, and FB spin rate. The only demo- graphic variable identified through SHAP value analysis as being among the top 10 features utilized by the XGBoost model was player weight, with a higher body weight generally contributing elevated SHAP values (increasing risk) toward injury prediction. However, this was not true in all cases, as some players with a higher body weight did not experience an increased risk for shoulder or elbow injury due to weight. This suggests that these players may have other protective factors interacting with weight that result in a lack of contribution to risk from this variable, and further investigation of this risk factor is warranted.
In contradiction to our hypothesis, none of the candidate features related to workload (pitches thrown, batters faced, innings pitched, and complete games) were identified as independent injury risk factors by the model. This finding should not be applied to youth pitchers, however, as the present study focused exclusively on MLB pitchers.
Although increased workload has been correlated with injury in youth pitchers,12,28 the same has not been demonstrated for MLB pitchers.8 An analysis of 161 starting MLB pitchers for the years 2010 to 2015 found no association between preceding years of cumulative pitches, starts, innings pitched, or mean pitches per start and being placed on the IL for any musculoskeletal reason.24 These findings may be subject to survivorship bias, where healthy pitchers are more likely to throw more innings because they have avoided the IL. So, this type of analysis may be self- selecting for healthy pitchers when traditional statistical methods are employed.
One prior study has applied machine-learning methodologies to predict next-season injuries in MLB players.15 However, the investigation was somewhat limited by inclusion of a vast amount of player performance data (which have since demonstrated high rates of volatility21) and a lack of ball- tracking data, which may not have been available at the time of that investigation.15 While predictive models of future injury were generated for both position players and pitchers separately in the aforementioned study, injury predictors were reported only for position players, as the top performing model for pitchers achieved an accuracy of only 64% with a corresponding AUC of 0.65.15
Furthermore, feature importance rankings revealed the most important factor used by the study’s best performing model for prediction of future position player injury was history of a previous injury, which was nearly 3 times as important as the next most important feature, weighted cutter runs per 100 pitches.
Other important features consisted of wins above replacement, number of pinch hits, and run expectancy wins, all of which had similarly low relative importance as determined by the Gini importance metric.15 This substantial difference in feature importance, with the most important factor demonstrating a relative importance of 1.00 and the majority of other factors demonstrating a relative importance \0.25, suggests that predictions of future injury were based primarily on whether the player was injured in the past, which is consistent with findings from other investigations as well. In the present study, multiple features with relatively high importance were identified—namely, features associated with pitch selection, velocity, spin rate, and movement.
One of the most important set of findings from this analysis was that strong feature interdependencies among some of the most predictive features were identified, which suggest potential avenues by which pitchers and teams can prioritize both player health and safety and pitching performance. FB velocity displayed strong interdependencies with the pitcher’s mean velocity for other pitches.
For example, the contributions of a pitcher’s CB and CH velocities on the overall predicted risk of shoulder or elbow injury were substantially higher if that pitcher also threw with a higher FB velocity, even if his CB and CH velocities were the same as another pitcher with a lower FB velocity. In contrast, a pitcher who threw with a high FB velocity but low CH or CB velocity experienced little to no elevation in predicted risk beyond the nominal value. Many hitters will agree that the gap in velocity between a pitcher’s FB and his secondary pitches can be more challenging than the FB itself.25 Thus, by learning to deliver the FB in more effective patterns, limiting use to strategic time points in the at-bat or game, teams may be able to optimize injury reduction while still prioritizing performance.
Other interaction effects observed included those between FB velocity and the frequency at which the pitcher threw SLs. While previous studies have identified high FB velocities as strong contributors to injury risk,5,16,17 we found that pitchers whose mean FB velocity exceeded 95 mph had even higher risks for injury when they were also throwing a high number of SLs.
It is interesting to note that the majority of pitchers who throw a high percentage of SLs are relievers, who are also more likely to throw with a higher mean peak velocity due to the expectation that they are able to throw with maximum effort more frequently than starters, who may be expected to pitch more innings and throw a greater variety of pitches over the course of a game.
While further biomechanical studies are needed to better understand the relationship between breaking pitches and injury risk, these results demonstrate a potential association between pitch selection and upper extremity injury risk.
While the variable age itself was not among the most important factors for predicting shoulder or elbow injury, it demonstrated strong interaction effects with a pitcher’s FB velocity. In particular, the contribution of age to injury risk prediction was elevated for older pitchers throwing with high velocity compared with older pitchers throwing with low velocity. This same relationship was not found for younger pitchers, whose injury risk contribution from age was not dependent on FB velocity in the same manner. This finding may suggest that older pitchers are attempting to compensate for age-related reductions in velocity with increased and more aggressive training regimens, potentially contributing to an increased risk for injury. Alternatively, the cumulative effect of years of throwing at a higher FB velocity may predispose these older players to an increased risk for subsequent injury when continuing to throw at high velocities.
Finally, it is interesting to note that injured pitchers demonstrated greater amounts of horizontal movement on their FBs relative to noninjured pitchers and the potential relationship between horizontal movement and a pitcher’s arm slot. A recent study evaluating the effects of contralateral trunk tilt on shoulder and elbow injury risk and pitch- ing biomechanics in professional baseball pitchers found that the greatest shoulder and elbow peak forces occurred in the group of pitchers with a trunk tilt of 15° to 25°, which was also the group throwing with a three-quarter arm slot.10
A separate study found that a more vertical arm slot position contributed to weaker FB movement, velocity, horizontal break, and vertical break relationships.29 Thus, the fact that more horizontal movement was associated with greater injury risk may be explained by the relation- ship between pitching with a lower arm slot and greater peak forces at the shoulder and elbow.
Limitations
This study has limitations that warrant discussion to ensure interpretation of its findings in their proper context. Because this study was conducted retrospectively, it is vulnerable to biases introduced by the nature of the data. Specifically, the IL is primarily intended as a roster management tool rather than a true medical record. Accordingly, many injuries may have occurred that were not substantial enough to warrant IL placement. While the joint involved can be interpreted with a high level of accuracy, the actual diagnosis given to an IL injury often lacks substantial specificity. In addition, the extent of injury and length of time spent on the IL was not incorpo- rated as part of the predictive model.
It is also possible that a number of predictive factors in the model are associated with temporal changes over the years included in the study. Yearly increases in features such as FB velocity result in the possibility of unobserved temporal changes interacting with the variables analyzed in this study. By analyzing metrics derived to obtain sea- son averages, more granular analyses of the time period immediately before injury as well as acute changes in metrics and pitch characteristics during the course of the sea- son were not performed. To address this shortcoming, future work will need to entail application of a sliding- window technique to aggregate pitching data across a more granular time course and analyze acute changes in metrics and their ability to predict shoulder and elbow injuries. Similarly, to evaluate the cumulative effect of multiple years of pitching with certain pitching patterns that may predispose pitchers to an increased risk for injury, evaluation of injuries that occur sooner or even beyond the next season may be warranted.
While balltracking data are increasingly available for players and coaches at the high school and college level, there are still many pitchers for whom ball-tracking data may be unavailable, which may limit the applicability of our findings to these players. Furthermore, this study was conducted using data exclusively on MLB pitchers, so the results should not be applied to pitchers at the youth level. Finally, external validation with data from individual teams, including both minor and major league players, is needed before the current model’s being deployed for injury prediction leaguewide.
CONCLUSION
Ball-tracking metrics were substantially more predictive of injury than player demographics and workload metrics. There were many significant game theory interdependen- cies of injury risk. Notably, the increased risk of injury that was conferred by throwing with a high velocity was even further magnified if the pitchers were also older, threw a high percentage of SLs, and/or threw a greater number of pitches.
ORCID iDs
Jacob F. Oeding https://orcid.org/0000-0002-4562-4373 Christopher L. Camp https://orcid.org/0000-0003-3058-7327
REFERENCES
1. Anz AW, Bushnell BD, Griffin LP, et al. Correlation of torque and elbow injury in professional baseball pitchers. Am J Sports Med. 2010;38(7):1368-1374.
2. Brown AR, Do AM. Predicting multiple injuries to Major League Base- ball pitchers: a logistic regression analysis over the 2009-2019 regu- lar seasons. Res Sports Med. 2023;31(6):811-817.
3. Bushnell BD, Anz AW, Noonan TJ, Torry MR, Hawkins RJ. Associa- tion of maximum pitch velocity and elbow injury in professional base- ball pitchers. Am J Sports Med. 2010;38(4):728-732.
4. Camp CL, Dines JS, van der List JP, et al. Summative report on time out of play for Major and Minor League Baseball: an analysis of 49,955 injuries from 2011 through 2016. Am J Sports Med. 2018;46(7):1727-1732.
5. Chalmers PN, Erickson BJ, Ball B, Romeo AA, Verma NN. Fastball pitch velocity helps predict ulnar collateral ligament reconstruction in major league baseball pitchers. Am J Sports Med. 2016;44(8):2130-2135.
6. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent report- ing of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. Br J Surg. 2015;102(3):148-158.
7. Conte S, Camp CL, Dines JS. Injury trends in Major League Baseball over 18 seasons: 1998-2015. Am J Orthop (Belle Mead NJ). 2016;45(3):116-123.
8. Erickson BJ, Chalmers PN, Bush-Joseph CA, Romeo AA. Predicting and preventing injury in Major League Baseball. Am J Orthop (Belle Mead NJ). 2016;45(3):152-156.
9. Erickson BJ, Chalmers PN, Zajac J, et al. Do professional baseball players with a higher valgus carrying angle have an increased risk of shoulder and elbow injuries? Orthop J Sports Med. 2019;7(8):2325967119866734.
10. Escamilla RF, Slowik JS, Fleisig GS. Effects of contralateral trunk tilt on shoulder and elbow injury risk and pitching biomechanics in pro- fessional baseball pitchers. Am J Sports Med. 2023;51(4):935-941.
11. Fleisig G, Chu Y, Weber A, Andrews J. Variability in baseball pitching biomechanics among various levels of competition. Sports Biomech. 2009;8(1):10-21.
12. Fleisig GS, Andrews JR, Cutter GR, et al. Risk of serious injury for young baseball pitchers: a 10-year prospective study. Am J Sports Med. 2011;39(2):253-257.
13. Fujimoto K, Kojadinovic I, Marichal J. Axiomatic characterizations of probabilistic and cardinal-probabilistic interaction indices. Games and Economic Behavior. 2006;55(1):72-99.
14. Gutierrez NM, Granville C, Kaplan L, Baraga M, Jose J. Elbow MRI Findings do not correlate with future placement on the disabled list in asymptomatic professional baseball pitchers. Sports Health. 2017;9(3):222-229.
15. Karnuta JM, Luu BC, Haeberle HS, et al. Machine learning outper- forms regression analysis to predict next-season Major League Baseball player injuries: epidemiology and validation of 13,982 player-years from performance and injury profile trends, 2000- 2017. Orthop J Sports Med. 2020;8(11):2325967120963046.
16. Keller RA, Marshall NE, Guest JM, et al. Major League Baseball pitch velocity and pitch type associated with risk of ulnar collateral liga- ment injury. J Shoulder Elbow Surg. 2016;25(4):671-675.
17. Labott JR, Leland DP, Till SE, et al. A number of modifiable and non- modifiable factors increase the risk for elbow medial ulnar collateral ligament injury in baseball players: a systematic review. Arthroscopy. 2023;39(8):1938-1949.
Luo W, Phung D, Tran T, et al. Guidelines for developing and report-
ing machine learning predictive models in biomedical research: a mul-
tidisciplinary view. J Med Internet Res. 2016;18(12):e323.
Major League Baseball Advisory Committee. Pitch Smart Guidelines.
Accessed April 24, 2023. http://m.mlb.com/pitchsmart/
Melugin HP, Smart A, Verhoeven M, Dines JS, Camp CL. The evi- dence behind weighted ball throwing programs for the baseball player: do they work and are they safe? Curr Rev Musculoskelet
Med. 2021;14(1):88-94.
Pareek A, Parkes CW, Leontovich AA, et al. Are baseball statistics an
appropriate tool for assessing return to play in injured pitchers? Anal- ysis of statistical variability in healthy players. Orthop J Sports Med. 2021;9(11):23259671211050933.
Raschka S. Model evaluation, model selection, and algorithm selec- tion in machine learning. arXiv University of Wisconsin. chrome- extension://efaidnbmnnnibpcajpcglclefindmkaj/https://arxiv.org/pdf/ 1811.12808, Revised November 11, 2020.
Rozemberczki B, Watson L, Bayer P, et al. The Shapley value in machine learning. ArXiv. Published online February 11, 2022. Revised May 24, 2022. doi:10.48550/arXiv.2202.05594
Saltzman BM, Mayo BC, Higgins JD, et al. How many innings can we throw: does workload influence injury risk in Major League Baseball? An analysis of professional starting pitchers between 2010 and 2015. J Shoulder Elbow Surg. 2018;27(8):1386-1392.
Sheinin D. Velocity is strangling baseball—and its grip keeps tighten- ing. The Washington Post. Published May 21, 2019. Accessed 24 April 24, 2023. https://www.washingtonpost.com/sports/2019/05/ 21/velocity-is-strangling-baseball-its-grip-keeps-tightening/
Triplet JJ, Labott JR, Leland DP, et al. Factors that increase elbow stress in the throwing athlete: a systematic review of biomechanical and motion analysis studies of baseball pitching and throwing. Curr Rev Musculoskelet Med. 2023;16(4):115-122.
USA Baseball Medical Safety Advisory Committee. Youth Baseball Pitching Injuries. Accessed April 24, 2023. https://www.usabase ball.com
Yang J, Mann BJ, Guettler JH, et al. Risk-prone pitching activities and injuries in youth baseball: findings from a national sample. Am J Sports Med. 2014;42(6):1456-1463.
Yoshida K, Nyland J, Krupp R. History of ulnar collateral ligament injury and college pitcher fastball profiles: a retrospective, observational, live pitching analysis. J Hand Surg Am. 2024;49(6):614.e1-614.e8.
You may also like: