Introduction

Assessment drives learning¹. How we assess learners is a major force in ensuring that learners are prepared for the next stage of training by driving the studying they do and by providing feedback on their learning. This Cell is designed to give you an overview of the major types of assessments along with their strengths and weaknesses so you can help make better decisions about how to assess your learners. You will also end up remembering and understanding the ABCs of assessment: assess at an appropriate level, be wary of weaknesses, and consider complementarity.

Learning Objectives (what you can reasonably expect to learn in the next 15 minutes):

List and classify different assessment methods according to Miller’s pyramid of clinical competence.
Given a testing situation (or goal) select and justify an assessment method.
Explain the ABCs of assessment and apply to testing situations.

To what extent are you now able to meet the objectives above? Please record your self-assessment. (0 is Not at all and 5 is Completely)

To get started, please take a few moments to list the assessment methods that you are familiar with. Can you think of a weakness or two? Write them out here so you can refer back to them shortly:

Now proceed to the rest of this CORAL Cell.

An Overview of Common Assessment Methods based on Millar’s Pyramid

A simple way to classify assessment methods is to use Miller’s pyramid (1):

The bottom or first two levels of this pyramid represent knowledge. The “Knows” level refers to facts, concepts and principles that learners can recall and describe. In Bloom’s taxonomy this would be remembering and understanding. The “Knows how” level refers to learners’ ability to apply these facts, concepts, and principles to analyse and solve problems. Again, in Bloom’s taxonomy, this would match applying, analyzing, evaluating, and creating of the cognitive domain. The “Shows” level refers to requiring students to demonstrate performance in a simulated setting. Finally, the “Does” level refers to what learners do in the real world, typically in the clinical setting for medical education. The “Does” level is addressed in more detail in the companion Cell, Methods of Workplace-based Assessment.

Clearly, the higher the level on the pyramid, the more authentic and complex the task is likely to be. The less standardised too. And typically as you move up the pyramid, it takes more time per case or question to assess attainment of the objective. This is a practical issue because clinical competence varies across cases, a phenomenon called case-specificity². Good multiple-choice questions may be less authentic than real patient interactions, but you can test a broader range of cases in a given amount of time with well-written MCQs compared to the traditional long case oral exam or even the OSCE. Every testing method has strengths and weaknesses so it is usually a good idea to use a variety of methods to achieve complementarity (ABCs!). The accompanying table briefly describes and explains a variety of testing methods.

“Knows” and “Knows how”

To test knowledge, you can use written or oral questions. These can be selected-response, where learners have to pick one or several answers from a pre-specified list (multiple-choice questions or pick N questions – where students must pick a given ‘N’ number of options would fall in this category), or constructed-response where students must generate their answer from scratch (short-answer questions for example). Table 1 contains a more complete presentation of various testing approaches. Whether a question tests at the first or second level of the pyramid is not determined by its format but rather by the task implied by the question. Compare the two questions below: which one tests at a higher level?

Selected-response: single-best answer multiple choice question

You are called to examine Mrs. Peters, a 65 year-old woman, with a history of diabetes and hypertension. Her current illness started with a sore throat but got steadily worse. She has been in bed for 3 days with a temperature of 39 C and a dry cough. On examination, her temperature is 39.5, HR 90 bpm, BP 120/80, RR 18/min. Her pharynx is red but the rest of the examination including chest auscultation, is normal. Which of the following is the most likely pathogen?

Influenza
Mycoplasma pneumoniae
Rhinovirus
Streptococcus A
Streptococcus pneumoniae

Constructed-response: short answer question

What are the three most common symptoms of influenza?

The MCQ tests at a higher level. Analysis and application of knowledge is required to select the best response but the short answer question is just asking for recall of knowledge.

To test knowledge application (“Knows how”), regardless of question format, the task must involve some sort of problem or case. Case-based multiple-choice questions are often more difficult to write so many of us have been exposed to the easier factual recall multiple-choice questions, which has created an erroneous perception that MCQ only test trivia. However MCQs can in fact be used to test clinical reasoning and have the advantage of enabling assessment of a broad range of cases in a short time. Of course, a written examination will never fully capture the complexities of actual clinical work, with challenges such as history-taking in patients who may be upset or have hearing issues, in a clinical setting characterised by noise and multiple interruptions. In other words, written examinations overestimate competence in clinical reasoning.

“Shows”

The Objective Structured Clinical Examinations (OSCE) is a typical method used here. The OSCE enables a fairly standardized assessment of clinical competence. By breaking down clinical work into tasks that can be performed in 5-15 minutes, OSCEs can test several cases (typically 10-15). However this tends to reduce clinical competence to discrete skills and fails to capture the integration of skills required in actual clinical practice. Simulated Office Orals - where the examiner both plays the patient and asks probing questions - is another method designed to assess the “Shows” level. See Table 1 for a more complete listing and description of various approaches to testing.

“Does”

Assessment in the clinical setting can focus on a single event (e.g. a single patient interaction) or be based on multiple observations aggregated across time. There are many tools - with as many acronyms - to assess single events: e.g., the mini-CEX (mini Clinical Evaluation Exercise), PMEX (Professionalism Mini-Evaluation Exercise), O-SCORE (Ottawa Surgical Competence Operating Room Evaluation). Tools that aggregate across multiple observations include end of clinic forms (e.g. field notes) or end-of-rotation forms (often referred to as ITERs, In-Training Evaluation Reports or ITARs – In-Training Assessment Reports). What can be observed can also vary, from patient encounters to procedures, case presentations, interactions with nurses and other healthcare professionals, notes, letters, and others. Please consider completing the sister Cell in the CORAL Collection specifically on the topic of workplace-based assessment.

Choosing an assessment method

In choosing an assessment method, multiple factors come into play, including feasibility. However, the goal will be to maximise validity within the constraints of your setting. Validity does not reside in the method alone; a test is not in and of itself valid or not valid. The same method can be implemented well or poorly, appropriately or not, and this will determine whether scores are being used appropriately (in a valid manner). Many factors influence whether or not the same tool is used in a valid way or not. An IQ test might help predict academic success but not marital or relationship success. The same multiple-choice exam in an invigilated versus non-invigilated setting, where students may collaborate or check answers in textbooks, may not be valid to use for summative decisions about individuals’ clinical knowledge.

The ABCs

In summary, there is no silver bullet for assessment in the health professions. Different methods have their purposes, strengths and weaknesses (see again Table 1; a similar table for the “Does” level is available in the Workplace-based Assessment Cell). In making decisions about which method (and ideally methods) to use, you should (here again are the ABCs but with more detail):

Assess at the Appropriate level of Miller’s pyramid (and Bloom’s taxonomy)

Beware of the weaknesses of each method and the factors that will influence validity in your own setting

Consider the Complementarity of different methods in the overall program of assessment to leverage the strengths of different methods and compensate for their individual weaknesses

Table 1: Methods of assessment Levels 1-3 of Miller’s pyramid

Method	Use to assess	Strengths	Limitations
Knows
1. Multiple Choice Questions (MCQ) Stem followed by multiple possible answers	Factual knowledge Problem solving, clinical reasoning	Efficient: can test many cases in short time Reliable: yields score that would be similar with a different set of questions Familiarity Easy to score and analyze results Feedback to learner is quick Extensively researched	Creating original, high quality questions takes time and requires expertise Risk of testing trivial knowledge Guessing Can overestimate competence Feedback is often limited to a total score, failing to highlight strengths and weaknesses
2. Essay, short answer questions Written response to a question or assigned task. Short answer questions involve responses of one or two paragraphs or less	Problem-solving, clinical reasoning Evaluative skills Reflective skills	Easier to design than MCQ	Scoring can be time- consuming Scoring can be unreliable if different levels of performance are not explicit
Shows
3. Objective Structured Clinical Examination (OSCE) Multiple separate task stations with performance requirements.	Clinical skills including clinical reasoning Communication skills Technical skills	Easy to score and to provide feedback Fairly high reliability, validity and manageability	Time pressure Hard to design and organize Need multiple raters Generally not designed to assess integration of skills
4. Simulated Office Oral (SOO) Examiner simulates a patient problem which the learner must manage	Knowledge Clinical reasoning Capacity to think quickly under pressure	Good feedback potential Scoring can be fast	Need standardization of interview procedure to ensure reliability and validity

Adapted from faculty development materials, Faculty of Medicine, McGill University

Check for Understanding

1. Classify each testing situation under one of Millar’s stages (Knows, Knows how, Shows, Does). The learner is asked to …

a. … describe how to mirror a patient’s emotional state.

b. … select the name of the most appropriate drugs to treat a plasmodium parasite.

c. … demonstrate on a mannequin the insertion of a central line

d. … explain how to draw blood from a patient

2. Answer each of the following True/False questions and briefly explain your answer referring where helpful to the ABCs of Assessment.

a. If you want to test clinical reasoning, the best method is directly observing a patient encounter True/False

b. OSCEs are the most valid way of testing clinical skills True/False

c. MCQs should only be used to test simple recognition of facts. True/False

3. Please explain the meaning and importance of the three maxims of the ABCs of Assessment (refer back if needed):

a) A:

b) B:

c) C:

Check your answers

Thank you for completing this CORAL Cell.

We are interested in improving this and other cells and would like to use your answers (anonymously of course) along with the following descriptive questions as part of our evaluation data.

Provide feedback on module

Thanks again, and come back soon!

The CORAL Cell Team

References:

1. Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine. 1990;65(9).

2. Norman G, Bordage G, Page G, Keane D. How specific is case specificity?. Medical education. 2006 Jul 1;40(7):618-23.

Credits:

Author: Valérie Dory, McGill University
Reviewer/consultant:
Series Editor: Marcel D’Eon, University of Saskatchewan

Back to the CORAL collection

Methods of assessment: An Introduction