Introduction

YouTube is a well-established social media site with a section dedicated to education. It has a vast array of medical videos available, ranging from patient made information, lived experiences, to institutional learning materials. Within surgery, it has been well documented that it is being used by medical students and doctors for learning.1,2

The use of YouTube for surgical education has always been heavily criticised, with concerns raised over the quality and accuracy of the content being accessed by learners. With these concerns there has been development and re-purposing of assessment tools, as well as guidelines to attempt to quantify and control the quality of educational videos.3 One example is the development of the LAP-VEGaS guidelines in 2018 by a joint committee of international multispecialty trainers and trainees. The aim of these guidelines is to improve video quality by providing a standard for creating and reviewing videos.4

As part of an MSc in clinical education the author reviewed the use of YouTube in surgical education. One of the primary objectives was to explore what is known about the quality of educational content available on YouTube, and to establish what indicators can be used to identify high quality videos to better inform learners and trainers.

Methods

A scoping review was carried out using the framework set out by Arksey and O’Malley,5 including the enhancements to this framework recommended by Levac, Colquhoun, and O’Brien.6 The overall aim of the study is to identify “what is known in the literature regarding the use and effectiveness of YouTube for training in postgraduate surgical education?”

The primary objectives of the study were:

  1. To identify how YouTube is being used for surgical education

  2. To explore what is known of the quality of educational content of YouTube

  3. To explore what is known of the effectiveness of using YouTube in surgical education

Four data bases (OVID Medline, Embase, Scopus and Web of Science) were searched using strict search terms. These search terms were established as follows ‘social media or YouTube’, ‘video based’, video based learning’, ‘video based education’, ‘surgery or general surgery’, ‘surgical training’, ‘surgical skills’, ’ surgical procedures’, ‘postgraduate surgical training’, ‘postgraduate surgical education’, ‘graduate surgical education’, ‘graduate surgical training’, ‘education, medical, continuing or education, medical, graduate or postgraduate medical education or “Internship or Residency”’ and ‘educational measurement’. These search terms were initially piloted and used in Ovid Medline and then tailored across the other 3 databases.

Studies were selected using inclusion and exclusion criteria. Finalised inclusion criteria: using YouTube, surgical training (including skills, procedures, and general teaching), postgraduate English language. Finalised exclusion criteria: patient information, use of other social media platforms, dental or veterinary surgery, recruitment, replies or comments to articles, editorials, review articles and conference abstracts.

The data was extracted using a data charting form. Basic numerical analysis and content analysis were performed, with content assessed using a descriptive analytical method.

Results

A total of 1548 articles were identified. Following duplicate removal, and application of inclusion and exclusion criteria thirty-two relevant articles were included. Twenty studies discussed the quality of the content on YouTube,7–26 and these studies are discussed here. Overall, there were 1434 videos assessed. Nine studies provided an overall breakdown of the educational quality of videos,8,9,12,14,19,20,22,25,26 with one study providing two separate scores for overall quality.8 Other studies discussed outcomes related to generic and specific operative criteria, video source, metrics, and analysis.

Table 1.Summary of relevant articles and key findings.
Author/⁠year of
publication
No. of
videos
Methods Outcome measures Important Results
Derakhshan (2019) 13 Cross sectional study. 5 search terms. 500 videos considered. Rhytidectomy videos graded using author criteria. 3 assessors, plastic surgeons. Intra/pre/post op quality criteria.
Video quality criteria.
Video characteristics.
Did not comment on overall quality.
Majority of videos achieved high scores for audio-visual features.
Luu (2021) 37 10 search terms. Overall videos identified and time period not included. YouTube videos for neck dissection were graded against LAP-Vegas + modified LAP-VEGaS criteria. 4 assessors, mix of residents and attendings. LAP-VEGaS and
Modified LAP-VEGaS score.
Difference between rater training authors. Video characteristics.
LAP-VEGaS: 24/37 medium quality (65%), 10/37 low quality (27%) 3/37 high quality (8%)
Modified LAP-VEGaS: 25/37 medium quality (67%), 4/37 low quality (10%), 8/37 high quality (22%). Narrated audio and subtitle annotations had moderate positive correlation with quality.
Like/dislike ratio had mild positive correlation with quality. Video age and count had no correlation with quality.
Chorath K (2021) 65 11 searches performed. Videos of thyroidectomy and parathyroidectomy graded using LAP-VEGaS criteria. 4 assessors (2 otolaryngologists and 2 residents) Educational quality of videos by LAP-VEGaS score.
Video source and characteristics.
43% videos medium quality. 39% low quality. 18% high quality.
Videos by otolaryngologists scored higher. No difference between industry videos.
Narrated audio and subtitles had positive correlation with scores.
Grayson S. (2017) 33 3 search terms, top 500 videos from each screened for relevance. YouTube accessed multiple times over 1 month. Unknown number of assessors. Specialty
Ultrasound or landmark guided
Video characteristics
Overall video quality not assessed. All videos from vascular specialists demonstrated ultrasound-guided access, interventional cardiology and radiology videos predominantly demonstrated landmark-guided access (100% vs.7%, p<0.05). Sheer volume of material can be overwhelming, only 33/500 (6.6%) videos were relevant.
Ferhatoglu M. (2019) 100 Sleeve gastrectomy keyword search. On 1 day. Top 10 SGSS YouTube videos were compared to top 10 WebSurg videos by popularity. VPI, DISCERN score, GQS, JAMAS, SGSS. Video metrics. Comparison of WebSurg and YouTube. Videos created by medical professionals had higher DISCERN, GQS, JAMAS and SGSS scores, with surgical technique videos have higher scores than other videos. VPI scores were lower in surgical technique videos than other videos. VPI scores of non-medical sourced videos were higher than medical videos.
Yammine K. (2020) 13 1 search term, over 1 day. Top 100 videos considered. 13 videos assessed using educational assessment tool (EAT) designed by authors. 2 assessing surgeons. EAT criteria.
Video duration, views, likes and dislikes.
4/13 of the videos (30.7%) had an excellent educational and technical quality. 7/13 were fair and 2/13 were good. 2/13 videos from academic institutions, did not comment on correlation of this with scores. No correlation between scores and video duration, views, likes or dislikes.
Fischer J. (2013) 13 5 search terms. Over 1 day. Technical procedure assessed against Swiss Society of Rheumatology (SSR) guidelines. Educational value assessed. Educational value. SSR guidelines for technical procedure. Video metrics. 8/13 videos were classified as educationally useful and 5 as unhelpful.
All videos complied with the guidelines from the Swiss Society for Rheumatology.
Garip R. (2021) 38 1 search term of ptosis surgery used. Over 6 days. 442 videos identified. Author scoring system.
All videos reviewed by 2 ophthalmologists.
Surgical score.
DISCERN, JAMAS, GQS
The mean surgical score was 7.5 +/- 2.7, indicating moderate to good quality. 7 videos were poor quality, 11 moderate, 16 good and 4 excellent qualities. The mean JAMAS was 1.3 +/- 0.5 indicating poor quality. Only 2 videos met the three criteria established by JAMA. The mean GQS was 3.1 +/- 1.1 indicating moderate quality. Positive correlation between video duration and surgical score, DISCERN and GQS. No relationship between scores and daily views, likes or dislikes. Mean DISCERN and GQS scores were higher in videos with narration.
Aykut A. (2019) 96 4 search terms, over 1 month. Same method was applied to Eyetube videos for comparison.
Compared by 3 authors of varying skill.
Pupil size. Mechanical dilatation. Ocular comorbidities. Surgical complications. 95 videos ended successfully, with 4 having complications (3.1%). Fair agreement between evaluators for pupil size but poor agreement according to safety. YouTube videos have low complication rates compared to the literature. There were only 9 videos identified on Eyetube.
Karabay E. (2021) 100 2 search terms, over 20 days. Videos assessed against the steps of hypospadias surgery set out by Campbell-Walsh Urology. 2 assessors, urologists. 8 steps of hypospadias surgery, based on Campbell-Walsh Urology. 15 videos were highly compatible with their checklist. (Group1), 42 of the videos were moderately compatible with their checklist. (Group 2), 43 were less compatible with their checklist. (Group 3). There was a weak correlation between total scores and like ratios and duration of videos.
Bitner B. (2022) 138 23 search terms. 4346 videos identified. 3 independent assessors (med student, resident and fellow). JAMAS, Modified GQS, Modified DISCERN
Video metrics
Academic videos more likely to have a higher like ratio. Videos from independent users had more comments.
DISCERN and GQS scores were significantly higher for academic affiliated videos. Academic affiliated users’ reliability and quality improved over time, with significant change in DISCERN and GQS. Video metrics were unable to predict VPI, JAMA, DISCERN or GQS scores.
Yee A. (2020) 117 Video library available on Nerve Surgery site and YouTube. Viewing data extracted from YouTube and PASSIO. Google analytics used to assess website Traffic on PASSIO. PASSIO members survey. Audience retention to long/short video formats. Daily views. Preferred video duration. Audience engagement. 3.2 million views on YouTube, (4 viral videos excluded left 561,292 views). 16,761 views on PASSIO.
PASSIO videos had a mean engagement of 48.4%, on YouTube this was 34.3%. 8.4% higher engagement with shorter videos on PASSIO and 13% on YouTube. Logarithmic decline in engagement was seen with increasing video duration for both.
Frongia G. (2016) 71 5 search terms, over 1 day. 673 videos identified.
Surgical quality assessed using OCRS. Educational quality assessed using EQRS. 4 independent assessors
Quality of LP videos. Video metrics.
EQRS
39.4% evaluated as good, 32.4% as moderate and 28.2% as poor. Average video duration can predict good quality video. Cut-off value >7.42min (sens 67.9% and spec 60.5%). Higher views/days online ratio correlated with higher quality. A cut off of >1.015 views/days online predicted good quality (sens 64.3%, spec 55.8%).
Addar A. (2017) 16 8 search terms, over 1 day. Identified 68,366 videos.
Assessment criteria modified from a previous study.
2 independent assessors (ortho residents).
Education content assessment.
Video metrics.
6/16 videos received a score of 4 or 5 out indicative of adequate or excellent educational video. 8/16 scored 1 or 2. Low participation by ortho surgeons. *Excluded all videos not made by a healthcare professional, comments on excellent quality videos that were excluded.
Arslan B. (2020) 226 3 search terms. 1688 videos identified. Videos scored using Prostatectomy Assessment and Competency Evaluation (PACE) score. 3 independent assessors. PACE scores.
Video metrics.
No difference between total PACE scores for LRP and RARP. Weakly positive correlation between video length and PACE score. Weakly positive correlation between and PACE score for RARP. No correlation was found between PACE scores and video source, or metrics. No objective parameter to predict educational quality.
De’Angelis N. (2019) 25 2 search terms, on 1 day. 25 most viewed laparoscopic appendicectomy videos.
3 surgical trainees and 3 senior surgeons independently
GOALS – Global Operative Assessment of Laparoscopic Skills
CVS – Critical View of Safety.
LAP-VEGaS guidelines.
13/25 were considered moderate/good quality, 12/25 of poor quality. 60% of videos had satisfactory CVS score.
100% agreement for video quality in 4 videos. Agreement among senior surgeons was higher (17/25 vs 8/25).
There was poor conformity to LAP-VEGaS guidelines (median 8.1%). No. of likes, presence of commentary, utility score and LAP-VEGaS conformity associated with moderate to good.
Erdem H. (2018) 175 3 search terms used, over 1 day. Scored using a usefulness score, modified for bariatric surgery. Considered patient experience videos as well. Usefulness score.
Video characteristics and sources.
53.7% were considered useful, 24.6% very useful and 21.7% not useful. The vast majority of very useful (95.3%) and useful (86.2%) were uploaded by doctor, hospital or medical website.
Months since upload was significantly higher in the very useful group.
Shires C. (2019) 100 Single search term used, 7260 identified, top 100 by relevance videos included. Primary Surgeon relevant publications cross referenced. Video type, Number of views, Type of Surgery, Publication history of primary surgeon. 58/62 had an identifiable primary surgeon. 42/62 had at least 1 publication related to thyroid. 20 had h-index >3 with an overall average of 10. Academic affiliation was identified for 32. Endoscopic thyroidectomy approaches were statistically more likely to have a surgeon with academic affiliation and a thyroid-related publication.
Besmens I. (2021) 36 1 search term used, over 1 day. 300 videos identified.
Video scoring system designed by authors using blepharoplasty literature. Reviewed by 2 assessors.
Video scoring system.
Video metrics including comments and narration.
78% low to moderate quality. Univariable analysis suggested a correlation between likes, views, comments and attributed education score. However, in multivariable logistic regression models no significant correlation between these factors could be found.
Lee JS. (2015) 73 1 keyword search, on 1 day. First 100 videos considered. Authors created scoring system from guidelines. 3 independent assessors. Arbitrary video score.
Video metrics.
11/73 (15.1%) considered good, 40 (54.8%) moderate and 22 (30.1%) poor. Most good videos were uploaded from tertiary centres (45.5%), while most poor videos were uploaded by secondary centres (54.5%).
Mean score for the tertiary group was significantly higher than the secondary centre group (6.0+-2 vs 3.9+-1.4).

Of the studies providing a breakdown of overall educational quality the predominant rating found was medium to high quality (n=6), with Luu et. al8 finding this to be the most predominant rating with both LAP-VEGaS and modified LAP-VEGaS criteria (range of 43 - 71.8%). Two of the studies found the highest proportion of videos to be of low quality (range 50 - 53.8%).12,20 Only 5 of the studies provided an excellent quality criterion, and of these studies the results ranged from 8 to 37.5% of videos.8,9,12,14,20

Figure 1
Figure 1.Chart showing the educational quality rates by percentage of High (excellent) quality, Medium (moderate/good) quality, and Low (poor/fair) quality.

Ten studies discussed specific videos features and their relationship to the video quality.7,9,12–14,20–23,25 Four of the studies found that the presence of narration or commentary had a positive correlation with overall video quality.8,9,14,22

Twelve studies commented on the relevance of the source of the videos to their quality.8–11,14,16,17,20,21,23,24,26 Two of these studies found that videos from medical professionals had higher quality scores than videos from non-medical sources (p<0.01, p<0.001).9,11 As well as this, videos from a specialist source had higher quality scores than a non-specialist medical source (p<0.001).9 However, they also found that videos from medical sources had a lower video power index (VPI) than those from non-medical sources. Another study found that videos from an academic affiliated source scored significantly higher for quality (p<0.001), as well as this the reliability and quality of the videos improved over time (p=0.03 for DISCERN and p=0.004 for GQS).17 Academic video sources were more likely to use more complex surgical techniques, with one study showing that endoscopic thyroidectomy was more likely from an academic linked surgeon (p<0.01).24 Lee, Seo, and Hong26 found that most good quality videos of laparoscopic cholecystectomy were uploaded from a tertiary centre (45.5%).26 However, 2 further studies found no link between quality and academic sources.16,21

Thirteen studies commented on the relationship between quality and video metrics used in YouTube e.g. duration, video age, number of views, number of likes, dislikes, ratio of likes to dislike and comments.8,9,12,14,16,17,19,21–26 Five studies found video quality was linked with likes and dislikes.8,9,14,16,22 Three studies using the LAP-VEGaS criteria found scores positively correlated with like and dislike ratio (R2=0.09, R2=0.08 and rho 0.691).8,9,22 The number of likes was also associated with moderate to good video rating.22 A further 2 studies, not looking at LAP-VEGaS criteria, found video score positively correlated with number of likes and like to dislike ratio.16,21 Three studies found a positive correlation between video length and quality scores,14,19,21 with one study providing a cut off value for predicting a good quality video as >7.42 minutes with a sensitivity of 67.9% and specificity of 60.5%.19

The remaining studies found no association between the common video metrics and quality,12,17,24–26 or had mixed results, finding one metric having a significant association which was then not shown in any further studies.23

Discussion

The predominant rating in most studies of medium to high quality clearly shows that there is video content on YouTube that is of high enough standard to be employed as educational resources. Along with this, there is a proportion of excellent quality videos (range 8 - 37.5%), that are likely as high standard as any resources provided by academic institutions. This is already reflected by the high rates of usage by surgical learners. It does have to be acknowledged that highest rating in two studies and second highest rating in all other studies was low quality (range 10 - 53.8%). This does highlight the issues raised by many educators of poor-quality videos being employed for teaching purposes and the unselected use of videos on YouTube could have a detrimental effect on learning.

Attempting to navigate through the amount of content on YouTube to identify good quality content is difficult, one study found only 6.6% of videos identified in a search were relevant10 and while appraising academic articles is a routine part of medical training and practice, this is not the case for video appraisal. The results of this review do highlight some useful points that both learners and trainers can use to help identify the better-quality videos. The use of commentary and narration in a video appears to be one of the best indicators of good quality videos, having been identified independently across multiple studies, it is also a criterion used within the LAP-VEGaS guidelines. The source of a video is important, with videos created by specialists in their field potentially providing the best quality, it is however difficult to identify the affiliations of a source easily. The metrics on each video provided by YouTube have very mixed results when looking at the association with video quality, most appear to have no relationship with the quality of the content, however the ratio of likes to dislikes may be a criterion that learners could use to guide them to better quality videos as it has been identified in several studies but it should be noted that other studies have not found any association.

The limitations of this review are found in the heterogeneity of the studies included, most studies have used their own criteria to establish the quality of videos and not a validated score. This raises questions of the reliability of results from those studies and how the results can be extrapolated. The major issue with using set criteria from the learner is that it may not acknowledge the intent of the video creator.

Conclusion

There is good quality educational content for surgical specialties on YouTube. However, the issue is how teachers and learners can easily identify these videos and thus maximise the learning potential of YouTube. Further research needs to be done to establish how YouTube can be utilised for education in surgery and we within the surgical field need to become more familiar with identifying high quality educational video content.