INTRODUCTION
Surgical training is challenging, with stressors like time pressures, long hours, and extreme debt loads contributing to the burden.1,2 Development of new surgical techniques increases the demands for learning additional skills, further escalating concerns regarding workloads and poor work-life balance.1–5 Laparoscopic training, while ostensibly sounding like an extension of existing surgical skills is far more challenging to learn.5 The technically challenging nature of laparoscopic procedures, as well as legal issues and operating room time limitations necessitate the ability for surgical trainees to gain experience and exposure outside of the surgical suites.3,5–7
Currently, laparoscopic simulators are used for laparoscopic training outside of the operating theatre.6–9 However, they are cumbersome, expensive, and given their fixed location, residents often struggle to find time on an already limited schedule, further limiting accessibility.1,2,6,7,10 Accordingly, a more portable and ideally enjoyable solution is needed to allow trainees to attain core competencies of their fields without further contributing to the already high prevalence of burnout.1,4
Video game (VG) playing improves visuospatial skills, hand-eye coordination, and familiarity with graphical interfaces; essential skills for laparoscopy.10–12 This technology has been used in other industries, including the US army, which uses VG for training special operation forces.10,13 In surgery, prior VG exposure has shown to be more positively correlated with laparoscopic skills than prior laparoscopy experience.8,10 Concerns with VGs include potential for increasing aggressive behaviours and reduced academic performance.14,15 While VG playing has been shown to decrease academic performance, burnout has been shown to specifically harm performance of residents on in-service examinations, a measurement of academic performance.4 This suggests the harmful effects of gaming may be mitigated by their enjoyable effect and possible reduction of associated burnout.
While several singular studies have been performed evaluating this topic, they suffer from variable methodology, and to our knowledge have not been previously summarized in a broad-based systematic review of trainee skill acquisition.6–10 There is need for such a study to assess the potential for widespread use of VGs which are less costly than formal simulators, and due to their enjoyable and portable nature may be used more frequently for acquisition of skills while potentially reducing the risk of burnout.
The objective of this systematic review was to assess whether using console-based VGs does improve the laparoscopic surgical skills of medical students and surgical residents as assessed by timing and accuracy metrics on laparoscopic simulators.
METHODS
All results and components of the review are reported in accordance with PRISMA reporting guidelines.16
Search Strategy
Ovid EMBASE, Medline, and The Cochrane central database were searched from inception to present (search conducted on October 30, 2022) inclusive. Search terms were reviewed with a health sciences librarian (Appendix A). Conference proceedings from the annual meeting of the Association for Surgical Education from inception (2015) to present were manually reviewed and resulted in no additional references. The complete reference list from all included studies were also manually reviewed for additional studies using the snowballing effect.
Inclusion and Exclusion Criteria
Randomised controlled trials evaluating performance of medical student and/or residents on laparoscopic simulators following exposure to console-based VG, compared with either no intervention or standard laparoscopic training (control) were included. Studies involving participants other than medical students or surgical residents were included but data had to be presented in a format that allowed surgical trainee data to be extracted independently.
Studies were excluded if they were not published in the English language or used hand-held VGs or virtual reality gaming, as we hypothesized that hand-held VG systems would require a different set of visuo-spatial skills and that virtual reality gaming would not accurately mimic the 2D nature of the laparoscopic and console-based VG screen.5,7,10,12 (Table S1; Appendix B)
Data Collection and Extraction
Two independent reviewers completed title and abstract screening, and full text review using the Rayyan software in accordance with the inclusion and exclusion criteria.17 Reviewer disagreement at either stage was resolved through discussion. Data was extracted using a prepared Excel spreadsheet.18 This form included study characteristics, demographic characteristics, and outcome data.
The primary outcomes of this study were time to completion and accuracy on laparoscopic skills simulator at post-treatment testing comparing the VG-exposed to the control group. Secondary outcome measures were academic performance, measured by differences between the groups in scores of clerkship or in-service examinations and correlation with real-life surgical metrics including improved operating time and reduced complication rates including infection and bleeding where applicable.
Assessment of risk of bias in included studies
Risk of bias (RoB) for each outcome analyzed was assessed by two independent reviewers. Each study was assessed for RoB in individual domains as well as an overall RoB assessment according to the revised Cochrane Risk of Bias Tool 2.0.19,20 (Table S2 and S3; Appendix C)
Data Synthesis & Statistical Analysis
Review Manager 5.3 software was used for meta-analysis data.21 When available, baseline and follow-up testing scores, along with mean difference (MD) were extracted. Outcomes reported as medians with interquartile range or five-number summaries were converted using standardized calculators to mean ± standard deviation (SD) and converted into standardized mean difference (SMD) with standard errors. This allowed comparison of the difference between testing scores of the VG exposed group to the control group at baseline and follow up as well as the difference between the two groups at follow up.22
Results were pooled using the Mantel-Haenszel method, with a random effects model for more than two studies and a fixed effects model for two or fewer studies. The performance of medical students and surgical residents were analyzed as subgroups given their differential expertise. Heterogeneity was assessed using visual inspection of the forest plot and the I2 statistic, with significant set at p=0.1. Sensitivity analysis was performed to assess the effect of decision making throughout the review process (Appendix D). The GRADE approach was used to evaluate the evidence for all primary outcomes.23 For SMD, any improvement was considered a minimally important difference (MID) given the enjoyable element of VGs and the likelihood that trainees are already using these technologies in their leisure time, and so this review would primarily assess if this game playing time improves their operative skills.
RESULTS
Our initial search identified 607 studies. After removing duplicates and applying inclusion criteria, six studies were included in the final analysis6,7,24–27 (Figure 1). These studies represented 137 participants with follow-up testing ranging from 0 days to 6 weeks (Table 1). Three studies were performed with medical students only, and three on surgical residents only. Three studies used ad-hoc exposure at the resident’s own preference and motivation for VG exposure, while the remainder used a structured delivered training regimen for VG exposure. Four studies employed virtual laparoscopic simulators, while two used wooden box and manual simulators which require an investigator to manually score the participant.
Risk of Bias Assessment
Regarding time to completion outcome, two studies were found to have some concerns for bias, two studies were assessed to have a high risk of bias, and one study was deemed to have a low risk of bias (Table S2). As for accuracy of performance, four studies were found to have an overall high risk of bias, one study had some concerns for bias, and only one study had a low risk of bias (Table S3). The complete rationale for risk of bias assessments is found in Appendix C.
Improvement in time to completion on laparoscopic simulator hand-eye coordination task
Three studies found improvements in time to completion after VG exposure. The other three studies found no statistical difference and one of these studies showed greater improvement the control groups performance. Only five studies were able to be compared in this meta-analysis, as one study did not report the individual values for each group. Forest plot analysis showed that VGs improve the SMD on timing of hand-eye coordination task on laparoscopic simulator by 1.76 SD, (95% CI = -3.20 to -0.32; Z=2.39; p=0.02; Figure 2A). Overall, this favoured exposure to VGs as improving time to completion on laparoscopic simulator. Subgroup analysis comparing studies which evaluated medical students vs surgical residents showed that the improvement in time to completion following exposure to VGs was more pronounced and more consistent in surgical residents (95% CI = -5.16, -0.13; Z=2.05, p=0.04 v. 95% CI=-2.44, 1.76, Z=0.32, p=0.75; Figure 2B).
Sensitivity analysis was performed using only studies for which measures of spread and central tendency could be extracted directly from the paper itself and used to calculate the SMD in RevMan directly. Both studies had a confidence interval that directly overlapped no effect of VGs (95% CI = -0.96 to 1.00, Z=0.05, p=0.96; Figure S1). Additional sensitivity analysis found that in all the studies that used virtual simulators, time to completion as assessed with converted SMD favoured improvement in time after exposure to VGs, whereas results from studies which used manual timing showed more heterogeneity (pooled SMD =-1.90; Z=2.61; p=0.009 v. pooled SMD = -1.38; Z=0.65; p=0.51; Figure S2).
The GRADE approach was used to assess the certainty of the evidence.23 Risk of bias in multiple studies was high or had some concerns, and only one study was assessed to have a low risk of bias. Thus, the evidence was downgraded by two. To assess the heterogeneity, visual inspection of the Forest plot showed that most studies favoured the intervention, with only one study favouring the control (Figure 2A). Overall, most studies were close to the line of no effect. The I2 value of 97% also suggests lots of heterogeneity (p<0.0001), which may in part be explained by small sample sizes of included studies and widely variable reporting of effect measures. For this reason, we have downgraded the evidence for serious concerns related to consistency of effect by one. Concerns were identified regarding imprecision related to the wide variety in reporting of the outcomes and the optimal information size; thus, the evidence was downgraded by one. The overall certainty of the evidence we deemed as being very low, with crucial limitations substantially lowering confidence given very serious concerns in heterogeneity and imprecision (Table 2).
@attachment
Improvement in time to completion on laparoscopic simulator procedural simulator task
Only one study presented data on time to completion of a complete laparoscopic procedural simulator (LapChole), which mimics a complete laparoscopic cholecystectomy and because of this it was not pooled as a meta-analysis. This single study found that exposure to VGs improved time to completion on a simulator of a complete operative procedure by 75 seconds (95% CI = -104.41 to -43.93; Z=4.81; p=<0.00001; Figure 2C). Given that only one study is included in this analysis, a GRADE assessment of the evidence was not performed. The single study had a low risk of bias in all domains for time to completion as an outcome.
Improvement in accuracy on laparoscopic simulator hand-eye coordination task
Five studies provided information about the effect of VG exposure on improvement in accuracy on hand-eye coordination task on laparoscopic simulator devices. Three studies found no difference between groups, while two studies found that accuracy improved after exposure to VGs. Two studies in the meta-analysis measured error rate, so to determine accuracy, the inverse sign was used for SMD, representing the equivalent improvement in accuracy. Exposure to VGs improved accuracy on follow up testing of hand-eye coordination tasks on laparoscopic simulators with a pooled SMD of 3.10 favouring VG (95% CI= 0.42 to 5.79; Z=2.27; p=0.02; Figure 3A). Subgroup analysis (Figure 3B) showed that studies evaluating residents had consistent and larger improvement in accuracy of 3.67 SD (95% CI = 3.21 to 4.14; Z=15.42; p<0.0001) after VG exposure than studies evaluating medical students where the increase was only 0.32 SD (95% CI= -0.41 to 1.05; Z=0.86, p=0.39).
Sensitivity analysis (Figure S3; Appendix D) showed greater improvement in accuracy in studies which used virtual simulators for assessment, with a pooled SMD difference of 3.67 SD (95% CI = 3.21 to 4.14; Z=15.42, p<0.00001) compared with 0.32 SD (95% CI=-0.41 to1.05; Z=0.86, p=0.39) in studies that used a wooden box stimulator. The I2 of 98.3% testing for subgroup differences suggests that pooling these studies together may be contributing to some of the heterogeneity seen in the primary analysis of accuracy.
The GRADE approach was used to assess the certainty of the evidence23,28 Risk of bias in four of the five included studies was high, with only one of the included studies having a low risk of bias for this outcome. For this reason, we downgraded by two for very serious concerns for risk of bias (Table 2). Assessing our forest plot with visual inspection (Figure 3A), our results show reasonable consistency favouring improvement in accuracy with VG, though one study does straddle having no effect or favouring the control group. The I2=97% suggests high level of heterogeneity (Chi2=62.55; p<0.0001; Figure 3A). Subgroup analysis shows a lower level of heterogeneity for the resident subgroup (I2=79%; p=0.03; Figure 3B), suggesting that pooling the two groups of trainees may contribute to inconsistency. The decrease in heterogeneity with sensitivity analysis comparing the two types of simulators (I2=79%; p=0.03 for virtual simulator studies; Figure S3 in Appendix D) may explain additional heterogeneity. The small sample sizes of the studies likely also contributed. Despite substantial heterogeneity in evidence, we did not downgrade more than one point (serious concerns) as we felt much of the heterogeneity could be explained and was likely related to the decision to pool widely.
We downgraded by one point for serious concerns of indirectness, related to one study using an ad-hoc exposure method while other studies used a structured exposure approach. Ad-hoc approaches may interact with the participant’s own motivation to improve or enjoyment of gaming. Regarding the control groups, all were asked to refrain from gaming – with no adherence reporting. While VGs in the control group would increase the robustness of the results, it contributes to indirectness of the evidence. Two of three studies assessed accuracy as the error rate, which was converted to a measure of accuracy. We downgraded for serious concerns for indirectness for these reasons.
Finally, regarding imprecision, we downgraded by two points for very serious concerns given the small number of studies and small sizes of the samples within these studies which were included in the analysis (Table 2).
Secondary outcomes
None of the included studies presented data on surgical clerkship or subspecialty in-service examination scores and if they were affected with VG exposure. One study reported that trainees in the VG group had a mean enjoyment level of 5.9 ± 2.7 out of 10, and that over half would recommend the game to others. A different study reported that residents assigned to ad-hoc gaming practised three times as much as standard training controls.
DISCUSSION
Overall, we found very low-quality evidence of improvement in time to completion on hand-eye coordination tasks. We considered this to cross the threshold of the MID, given that many trainees are likely playing VGs in their leisure time already, and recognizing this impact may not alter their behaviour but provide positive feedback to an activity they already find pleasant.12 Timing on a single task does not predict overall surgical skills, or even surgical skills in one procedure. We sought evidence for timing improvement with simulators that mimic whole procedures. One study provided data about this and found that the MD considerably favoured the VG exposure group.6 Timing is not everything in surgery. If one is fast without being careful, it can be dangerous. In our pooled meta-analysis on accuracy data, we found very low-quality evidence that VGs probably improve accuracy on hand-eye coordination tasks, with a pooled improvement of 3.1 SDs higher than the control group. For both timing and accuracy, subgroup analysis comparing studies of medical students with those of residents showed a more pronounced effect in surgical residents, possibly reflecting that improving with VGs requires a baseline surgical experience. This might also reflect a synergistic effect of consistent practise of surgical skills with VGs.
Despite an extensive literature search, only one other systematic review of this topic was identified which found reduction in error rate (similar to the accuracy improvement we saw) and inconsistent results for time improvement on simulators.29 This review included studies up until 2014, and as such, our review provides additional data that would not have been available to be evaluated.29 Several studies support a correlation between prior video gaming and baseline laparoscopic skills, which similarly suggests that VGs may improve laparoscopic performance.10,30,31 Our review is consistent with other literature showing small, but potentially important effects of VGs in this domain.7,25–27,32–39
While most included studies excluded trainees with prior laparoscopic experience, not all did, and given the result of our subgroup analysis, future studies should investigate if there is a synergistic effect of ongoing surgical experience with the VG experience. Additionally, while outcomes of assessments on laparoscopic simulators are well-validated approaches for assessing surgical skills, to some extent they are surrogate outcomes for the true test of surgery. Unfortunately, none of the included studies assessed real-world surgical applications or the potential negative effects of increased VGs on academic achievement. Thus, our review likely does not reflect the complete picture of how VGs may affect the busy surgical trainee.
A limitation of our review that may have contributed bias was the requirement for multiple, different versions of summary statistics from the included studies given the widely variable reporting of outcomes in these studies.22,40 An additional limitation is the small sample sizes and variable methodology in these studies. Overall, our analysis supports the need for a larger, randomized trial evaluating surgical skills performance after standardized VG exposure in surgical residents, who seem to have the greatest net benefit from VGs and have the largest requirement to learn these skills.
Conclusions
While the quality of evidence for both outcomes is very low, our review does show small but potentially important improvements of surgical skills with VGs. Given rising concerns for burnout of surgical trainees, this suggests supporting their use of a common leisure activity as it may both reduce their stress levels while contributing to their operative performance.2,4,6