MCAT Textbooks / MCAT Physics and Math Review 2026-2027 / Ch 11 of 12

📐 Reasoning About the Design and Execution of Research

12,054 words · 8 figures · ≈52 min read · MCAT Physics and Math Review 2026-2027

Chapter 11: Reasoning about the Design and Execution of Research

Chapter 11: Reasoning About the Design and Execution of Research with a centrifuge in the background

Chapter 11: Reasoning about the Design and Execution of Research

Science Mastery Assessment

Every pre-med knows this feeling: there is so much content I have to know for the MCAT! How do I know what to do first or what’s important?

While the high-yield badges throughout this book will help you identify the most important topics, this Science Mastery Assessment is another tool in your MCAT prep arsenal. This quiz (which can also be taken in your online resources) and the guidance below will help ensure that you are spending the appropriate amount of time on this chapter based on your personal strengths and weaknesses. Don’t worry though— skipping something now does not mean you’ll never study it. Later on in your prep, as you complete full-length tests, you’ll uncover specific pieces of content that you need to review and can come back to these chapters as appropriate.

How to Use This Assessment

If you answer 0–7 questions correctly:

Spend about 1 hour to read this chapter in full and take limited notes throughout. Follow up by reviewing all quiz questions to ensure that you now understand how to solve each one.

If you answer 8–11 questions correctly:

Spend 20–40 minutes reviewing the quiz questions. Beginning with the questions you missed, read and take notes on the corresponding subchapters. For questions you answered correctly, ensure your thinking matches that of the explanation and you understand why each choice was correct or incorrect.

If you answer 12–15 questions correctly:

Spend less than 20 minutes reviewing all questions from the quiz. If you missed any, then include a quick read-through of the corresponding subchapters, or even just the relevant content within a subchapter, as part of your question review. For questions you got correct, ensure your thinking matches that of the explanation and review the Concept Summary at the end of the chapter.

Answer Key

Chapter 11: Reasoning about the Design and Execution of Research

CHAPTER 11

REASONING ABOUT THE DESIGN AND EXECUTION OF RESEARCH

In This Chapter

11.1 The Scientific Method

The FINER Method

11.2 Basic Science Research

Controls

Causality

Error Sources

11.3 Human Subjects Research

Experimental Approach

Observational Approach

Error Sources

11.4 Ethics

Respect for Persons

Justice

Beneficence

11.5 Research in the Real World

Populations vs. Samples

Generalizability

Support for Interventions

Concept Summary

MCAT EXPERTISE

As with the previous chapter, this chapter does not contain any content that falls directly within an AAMC content category. That said, the AAMC has confirmed that a full 10% of the science questions on every MCAT will touch on material in this chapter – and many of those questions will require only information from this chapter, without any other supportive science content. That makes it, point for point, one of the most important chapters in the entire MCAT Review series!

Introduction

In the modern world, there are standard places to search for the answers to questions we have: encyclopedias, academic journals, online databases, and other publications. By perusing current research on a topic, we can figure out whether someone else has already asked and answered our question. This is just as true for scientists and other professionals. In many cases, our searches may take us in different directions. This may be as simple as finding additional resources that confirm the answers to our questions; however, research is often more complicated than that. We may find conflicting answers and have to perform critical analysis to determine which data set (and conclusion) was obtained in the most legitimate, unbiased way. Other times, we’ll have to find the answers on our own through experiment or observation and data analysis.

In order to generate a consistent body of knowledge, the ways in which scientists generate and search for information must be orderly and uniform. In this chapter, we will discuss the premise of basic science, biomedical, and social sciences research. Our examination will include the necessary criteria for causality, different types of error, and the ethical implications of human subjects research. These research methods are core skills that the MCAT tests—in fact, they constitute one of the four Scientific Inquiry and Reasoning Skills on the exam. In addition, their importance will last for the rest of your medical career. As a physician, you will constantly be seeking answers in research to determine prognoses, assess the appropriateness of a treatment modality for a given patient, and answer patients’ questions. You may be a researcher yourself in a basic science, clinical, or translational setting. Regardless of your path, evaluating research is critical to the progress of all fields of medicine and will be a key component of your life as a physician.

11.1 The Scientific Method

LEARNING OBJECTIVES

After Chapter 11.1, you will be able to:

The basic paradigm for all scientific inquiry is the scientific method. The scientific method is a set of steps that defines the appropriate order of events to structure and carry out an experiment. As such, the scientific method is the established protocol for transitioning from a question to a new body of knowledge. The steps in the scientific method are:

BRIDGE

It is easy to focus on research that agrees with our expectations or opinions, and to ignore research that goes against them. This is an example of confirmation bias at work. Specific types of biases are discussed in Chapter 4 of MCAT Behavioral Sciences Review and later in this chapter.

MCAT EXPERTISE

The MCAT will most often test experimental or logical errors during research. Pay particular attention to the scientific method now in order to recognize each area in which an error may occur.

Formulating a testable question often presents students with a challenge because we tend to overreach, creating all-encompassing broad questions; however, in order to form a good testable question, it must be restricted to a relatively narrow area. The same is true of our hypotheses. One might wonder, Why do hot objects cause injury? This is not a testable question. In fact, most questions that begin with Why are too broad to be testable through a single experiment. In this case, a better, more testable question could be How do epithelial cells respond to heat in vivo? Even this question is likely too broad for a single experiment, but it is testable, and we can form related hypotheses. One possible hypothesis would be: if heat is applied to in vivo epithelial cells, then those cells will lyse. Pay attention to the format of the hypothesis, as the if–then format ensures that it is testable. We will examine other components of the scientific method as they pertain to basic science research, biochemical and biomedical research, and social science research in the relevant sections.

The FINER Method

The FINER method for evaluating a research question is a method to determine whether the answer to one’s question will add to the body of scientific knowledge in a practical way and within a reasonable time period. The FINER method asks five questions to make this determination.

MCAT CONCEPT CHECK 11.1

Before you move on, assess your understanding of the material with these questions.

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

11.2 Basic Science Research

LEARNING OBJECTIVES

After Chapter 11.2, you will be able to:

accuracy and precision of measurements

Basic science research—the kind conducted in a laboratory, and not on people—is generally the easiest to design because the experimenter has the most control. Often a causal relationship is being examined because the hypothesis generally states a condition and an outcome. In order to make generalizations about our experiments, we must make sure that the outcome of interest would not have occurred without our intervention, and therefore, we use controls. We must also demonstrate causality, which is relatively simple in basic science research, but less so in other research areas.

Controls

In basic science research, conditions can be applied to multiple trials of the same experiment that are as near to identical as possible. In this way, a control or standard acts as a method of verifying results. Consider the following experiment: a scientist has an unknown concentration of a basic ammonia solution and wishes to determine the concentration experimentally. The scientist takes a standardized solution of hydrochloric acid (made by comparison to a potassium hydrogen phthalate [KHP] standard) and titrates the basic solution in the presence of the same calibrated pH meter used for the hydrochloric acid standardization. The scientist then determines the ammonia concentration from the results of the titration. Because the concentration of the acid used to determine the ammonia concentration was verified against a standard, the scientist can be confident that the calculated ammonia concentration is accurate.

REAL WORLD

The use of controls also allows investigators to check for contamination of reagents.

Controls can also be separate experimental conditions altogether. For example, when testing the reaction of a tissue culture to an antibiotic, a separate culture is generally grown and administered an equal quantity of a compound known to be inert, like water or saline. The control corrects for any impact that the simple addition of volume might have had on the experiment. Some experiments have both positive and negative controls for points of comparison or a group of controls that can be used to create a curve of known values. Positive controls are those that ensure a change in the dependent variable when it is expected. In the development of a new assay for detection of HIV, for example, administering the test to a group of blood samples known to contain HIV could constitute a positive control. Negative controls, in contrast, ensure no change in the dependent variable when no change is expected. With the same assay, administering the test to a group of samples known not to contain the HIV virus could constitute a negative control. In drug trials, a negative control group is often used to assess for the placebo effect—an observed or reported change when an individual is given a sugar pill or sham intervention.

Example: A clinical trial is devised to quantify the effectiveness of a retinal scanning system for the detection of Alzheimer’s disease. The trial will include a positive control group, a negative control group, and an experimental group. The experimental group is a large sample of individuals who are 70 years old and showing symptoms of Alzheimer’s disease. Assuming the retinal scanning system is effective at detecting Alzheimer’s disease, rank the groups in decreasing order of percentage of expected disease detection.

Solution: Members of the positive control group should consist of patients who have previously been diagnosed with Alzheimer’s disease, whereas members of the negative control group should consist of patients who have previously tested negative for Alzheimer’s disease. Therefore, the positive control group and negative control groups should establish the upper and lower bounds of percent detection, respectively. The experimental group is likely to fall between those two extremes. The predicted order is: positive, experimental, and negative.

Causality

The other big advantage to being able to manipulate all of the relevant experimental conditions is that basic science researchers can often establish causality. Causality is an if–then relationship, and is often the hypothesis being tested. In basic science research, we manipulate an independent variable, and measure or observe a dependent variable. When there is a theoretical or known mechanism that links the independent and dependent variables, a causal relationship can be investigated. If the change in the independent variable always precedes the change in the dependent variable, and the change in the dependent variable does not occur in the absence of the experimental intervention, the relationship is said to be causal.

Example: A test was conducted to determine if water consumption impacts systolic blood pressure in mice. The water consumption and systolic blood pressure of eight mice was tracked for three weeks. The average daily water consumption and the average systolic blood pressure of each mouse is listed in the following table.

Mouse Average Daily Water Consumption (mL) Average Systolic Blood Pressure (mmHg)

1 3.9 122

2 3.3 128

3 1.5 146

4 4.7 117

5 6.1 108

6 0.8 153

7 10.3 103

8 8.6 105

The researchers plotted pressure and consumption and obtained the following graph. Identify the independent and dependent variables, and determine if a causal relationship exists between the two.

Scatter plot graph with average daily water consumption on the x-axis and average systolic blood pressure on the y-axis

Solution: The independent variable is the “average daily water consumption (mL)” and the dependent variable is the “average systolic blood pressure (mmHg).”

Not enough data is given to determine if a causal relationship exists. More information is needed to determine if the differences in systolic blood pressure occur due to differences in water consumption. It’s possible that the mice could have the given systolic blood pressure values even when consuming equal amounts of water.

KEY CONCEPT

The independent variable is the one that the experimenter is manipulating, and the dependent or outcome variable is the one that is being observed. On a graph the independent variable belongs on the x-axis and the dependent variable belongs on the y-axis.

Error Sources

In basic science research, experimental bias is usually minimal. The most likely way for an experimenter’s personal opinions to be incorporated is through the generation of a faulty hypothesis from incomplete early data and resource collection. However, there can be manipulation of the results by eliminating trials without appropriate background, or by failing to publish works that contradict the experimenter’s own hypothesis.

The low levels of bias introduced by the experimenter do not eliminate all error from basic science research. Measurements are especially important in the laboratory sciences, and the instruments may give faulty readings. Instrument error may affect accuracy, precision, or both. Accuracy, also called validity, is the ability of an instrument to measure a true value. For example, an accurate scale should register a 170-pound person’s weight as 170 pounds. Precision, also called reliability, is the ability of the instrument to read consistently, or within a narrow range. The same person standing on a scale that is accurate but imprecise may get readings between 150 and 190 pounds. The same person standing on a scale that is inaccurate but precise may get readings between 129 and 131 pounds, a relatively narrow range. Accuracy and precision are represented in Figure 11.1. Because bias is a systematic error in data, only an inaccurate tool will introduce bias, but an imprecise tool will still introduce error. Random chance can also introduce error into an experiment; while random error is difficult to avoid, it is usually overcome by using a large sample size.

innacurate and imprecise: large cluster not centered on bulls-eye; accurate, but imprecise: large cluster centered on bulls-eye; inaccurate, but precise: small cluster not centered on bulls-eye; accurate and precise: small cluster centered on bulls-eye

Figure 11.1 Accuracy (Validity) and Precision (Reliability) of Measurements

MCAT CONCEPT CHECK 11.2

Before you move on, assess your understanding of the material with these questions.

_________________________________

x- and y-axis

_________________________________

11.3 Human Subjects Research

LEARNING OBJECTIVES

After Chapter 11.3, you will be able to:

In some cases of biomedical or clinical research, we must move away from petri dishes full of cells or experimental animal models that can have all aspects of their living conditions controlled to research on human subjects. For ethical reasons, which we will discuss later, the level of experimental control is invariably lower than basic science research, and the relationships established by such research will therefore be weaker. In human subjects research, there are both experimental and observational studies.

Variable Types

In data analysis, it is important to consider the variable types used in the study. Binary variables only have two options, such as “yes” or “no.” Continuous variables can take on any value between a certain range, such as amount of weight lost, height, or percent improvement in cardiac output. In contrast, categorical variables take on a fixed amount of outcomes. For example, if a study conducted in the Unites States separates participants by the state they live in, there are only 50 possibilities rather than an infinite range.

Data can also be categorized as either quantitative data or qualitative data. Quantitative data describes data that is in the form of numbers, including percentages or statistics. In contrast, qualitative data describes data that is non-numeric. A study that looks at the percentage of people who own a house in Canada would be an example of a quantitative study, as it numerically characterizes the percentage of people who own a house. A follow-up study that asks house-owners to explain how they feel about owning a house would be an example of a qualitative study. In a mixed-method study, research generates both numerical and non-numerical data. If the follow-up study also calculated how many house-owners felt negatively about owning a house, it would have both quantitative data (percentage of house-owners with negative thoughts) and non-numerical qualitative data (the data on the feelings of each participant about owning a house).

Experimental Approach

In biomedical research, it is possible to perform experiments in which an independent variable is manipulated and an outcome is observed. In these experiments, we are still attempting to elicit a causal relationship. Because subjects are in less-controlled conditions, the data analysis phase is more complicated than in laboratory studies. In clinical and social sciences research, it is often still possible to conduct experiments by manipulating the environment or circumstances of the subject.

Randomization

Randomization is the method used to control for differences between subject groups in biomedical research. Randomization uses an algorithm to determine the placement of each subject into either a control group that receives no treatment or a sham treatment, or one or more treatment groups. A proper randomization algorithm will be equivalent to a coin toss or die roll. Once each individual is assigned to a group, the intervention is performed and the results are measured. Ideally, each group is perfectly matched on conditions such as age and gender; however, as long as there is an appropriate randomization algorithm, the collected data may be analyzed without concern.

Blinding

Because many of the measures in biomedical research are subjective, the perception of the subject and the investigator may be biased by knowing which group the subject is in. To remove this bias, the subjects and/or investigators may be blinded, which means they do not have information about which group the subject is in. In single-blind experiments, only the patient or the assessor (the person who makes measurements on the patient or performs subjective evaluations) is blinded. In double-blind experiments, the investigator, subject, and assessor all do not know the subject’s group. Without blinding, the placebo effect would be greatly reduced in the control group, but still be present in the treatment group.

REAL WORLD

Blinding isn’t only useful in drug trials; even sham treatments of acupuncture have been used to blind subjects in randomized controlled trials focusing on the use of acupuncture for musculoskeletal pain.

Data Analysis

In biomedical research, data analysis must account for both the variables used in the study and variables outside of those used. Some influencing factors can be inferred from the initial literature review, although other unexpected variables, such as confounding variables (discussed later on in this section), may exist and may need consideration during data analysis. Other important considerations during data analysis include differentiating between a main effect, as opposed to an interaction effect. A main effect describes how a single independent variable individually affects the dependent variable. An interaction effect, on the other hand, defines how one independent variable affects the relationship between a different independent variable and the dependent variable. For example, assume a study looked at how well students of different grade levels did on exams on different days of the week. If it was shown that within each grade level, students did best on Mondays and worst on Fridays, the main effect of the variable “day of the week” on test scores would be a decreased performance from Monday to Friday. However, if higher grade levels showed a significantly smaller drop in test scores during the week than students from younger grades, it means test scores are not only influenced by the day of the week, but also by grade level.

There are different methods of data analysis, with most outside of the scope of the MCAT, though methods such as the creation of a regression model can be used to look at the relationship between the studied variables. This analysis may demonstrate linear, parabolic, exponential, logarithmic, or other relationships, as is discussed further in Chapter 12 of MCAT Physics and Math Review.

Observational Approach

We may wish to study certain causal associations for which an experiment cannot be performed for ethical or practical reasons. In such a case, we must draw on the available data and analyze it. Observational studies in medicine fit into one of three categories: cohort studies, cross-sectional studies, and case–control studies. These studies often look for the connections between exposures and outcomes. Observational studies do not demonstrate causality, although the tendency toward causality may be demonstrated by Hill’s criteria, which we will examine later.

REAL WORLD

Ethnographic studies are observational studies utilized by sociologists. These studies attempt to understand cultures by looking at the complete social environment.

Cohort studies are those in which subjects are sorted into groups based on differences in risk factors (exposures), and then assessed at various intervals to determine how many subjects in each group had a certain outcome. For example, a study in which 100 smokers and 100 nonsmokers are followed for 20 years while counting the number of subjects who develop lung cancer in each group would be an example of a cohort study.

KEY CONCEPT

A longitudinal study is an observational research method that follows the same subjects over time. Therefore, a cohort study is a form of longitudinal study.

Cross-sectional studies attempt to categorize patients into different groups at a single point in time. For example, a study to determine the prevalence of lung cancer in smokers and nonsmokers at a given point in time would be an example of a cross-sectional study.

Case-control studies start by identifying the number of subjects with or without a particular outcome, and then look backwards to assess how many subjects in each group had exposure to a particular risk factor. For example, a study in which 100 patients with lung cancer and 100 patients without lung cancer are assessed for their smoking history would be an example of a case–control study.

Other Types of Studies

Surveys are studies that give predetermined questions to participants to answer, which are then used for statistical analysis to draw conclusions. For example, a restaurant may send their diners a survey of 10 questions on how they would rate their food on a scale of 1 to 10 compared to the food from a competing restaurant. From the compiled ratings, the restaurant would be able to tell if the diners preferred their food or their competitor’s food. Note that, in real life, surveys can contain either qualitative or quantitative data; however, the MCAT tends to consider surveys to be quantitative.

Content analysis is a type of study that looks at documents and communication artifacts to find themes and patterns in communication. For example, a historian may read numerous ancient documents on a historical figure to analyze the use of positive and negative adjectives in an effort to determine how that historical figure was generally viewed by others.

In-depth interviews, as the name implies, is a qualitative method of obtaining detailed information from individuals using open-ended questions through individual interviews. These interviews are done with participants one-on-one.

Focus groups are another method of qualitative data gathering, where groups of people are assembled to participate in a guided discussion of a topic. Participants are asked to share their thoughts, experiences, and perceptions, and are also encouraged to engage with one another. An example includes a marketing department of a company bringing together twenty potential customers to gain detailed feedback on a proposal the company has for a new product.

Hill’s Criteria

Hill’s criteria describe the components of an observed relationship that increase the likelihood of causality in the relationship. While only the first criterion is necessary for the relationship to be causal, it is not sufficient. The more criteria that are satisfied by a relationship, the likelier it is that the relationship is causal. Hill’s criteria do not provide an absolute guideline on whether a relationship is causal; thus, for any observational study, the relationship should be described as a correlation.

Error Sources

In addition to the measurement error found in basic science research, we must be aware of bias and error introduced by using human subjects as part of an experimental or observational model. As mentioned earlier, bias is a systematic error. As such, it generally does not impact the precision of the data, but rather skews the data in one direction or another. Bias is a result of flaws in the data collection phase of an experimental or observational study. Confounding is an error during analysis.

Selection Bias

The most prevalent type of bias is selection bias, in which the subjects used for the study are not representative of the target population. People who volunteer for a study in a particular area may be significantly different from people who do not volunteer. For example, someone volunteering for a drug trial that requires clinical visits may be healthier or more likely to benefit from the study than someone who does not volunteer because they cannot make it to the hospital.

Selection bias may also apply in cases where one gender is more prevalent in a study than another, or where there are differences in the age profile of the experiment group and the population. Measurement and assessment of selection bias occurs before any intervention.

Detection Bias

Detection bias results from educated professionals using their knowledge in an inconsistent way. Because prior studies have indicated that there is a correlation between two variables, finding one of them increases the likelihood that the researcher will search for the second. For example, high blood pressure (hypertension) and diabetes mellitus are more common among individuals who are obese; thus, a physician may screen patients who are obese for hypertension and diabetes at a higher rate than other patients, inflating the true value of the secondary measurement (although, as described in Chapter 12 of MCAT Behavioral Sciences Review, other biases against individuals who are obese actually tend to lead to lower rates of screening and preventative care).

Response Biases

The tendency for participants to inaccurately respond to questions in research is called a response bias. This type of bias can impact the validity of certain types of studies that depend on self-reporting from participants, such as surveys. The phenomenon of reactivity, which is a general term that describes how people change their behavior when they know they are being watched, offers one explanation of how response bias can be introduced: The Hawthorne effect, or observation bias, is a specific type of reactivity that posits that the behavior of study participants is altered simply because they recognize that they are being studied. For example, patients in a study for a given weight loss drug may begin exercising more frequently or may make healthier diet choices, thus artificially increasing the perceived effect of the drug. Because the change in data is systematic and occurs before data analysis, this is an example of bias. Another type of response bias includes social desirability bias, where participants answer questions in a way that makes them seem more socially fitting. For example, in a survey that asks MCAT students how many hours a week they spend studying, participants may respond with a higher than accurate number in order to make themselves seem like better students. One other type of response bias worth knowing is called demand characteristics, which occurs when participants change their behavior or responses in a study in order to match what they think is expected of them from the experimenter. While similar to the idea of social desirability bias, the key difference is social desirability bias is geared towards fitting into social norms, while demand characteristics is about a participant trying to behave in a way they think the experimenter wants them to behave.

Confounding, Mediating, and Moderating Variables

Confounding, sometimes inaccurately called confounding bias or omitted variable bias, is a data analysis error. Confounding variables, or confounders, are unmeasured and not-controlled-for variables that affect both the independent and dependent variables in a study, leading to a “confounding” or mischaracterization of the true association between them. These “third-party” variables are illustrated schematically in Figure 11.2. For example, say a researcher looks at the frequency of cancer in coffee drinkers. Noticing that coffee drinkers have higher rates of cancer, the researcher may erroneously assume coffee causes cancer. However, the researcher did not take into account other potentially confounding variables such as smoking. If coffee drinkers are more likely to smoke than non-coffee drinkers, and smoking increases the likelihood of cancer, this means that the smoking, not the coffee, is the actual cause of increased cancer rates in coffee drinkers

exposure appears to lead to outcome, but confounder affects both exposure and outcome

Figure 11.2. Confounding

Confounding variables should not be mistaken for mediating variables. Unlike confounding variables, mediating variables sit “in-between” two variables and provide a mechanism to explain an association. For example, if one were to find a positive correlation between level of education and life satisfaction, a mediating variable might be income, where a higher level of income leads to better access to healthcare, which in turn leads to an increase in satisfaction. Another type of variable to be aware of is a moderating variable, which is a variable that changes the strength of the association between the independent and dependent variables. For example, imagine a study was conducted looking at a new medication in patients with heart disease, and the medication was not shown to have a benefit. However, when the patients were separated by biological sex, the medication significantly improved heart health in female patients, but not male patients. In this scenario, biological sex would be considered a moderating variable since the association between taking the medication and heart health in patients with heart disease is dependent on whether someone was a biological male or female. Moderating variables are closely tied to the idea of interaction effects, though the term moderating variable refers to the variable itself, while the effect of the moderating variable is characterized as an interaction. In contrast to confounding, moderating, and mediating variables, control variables are variables that are purposely held constant throughout an experiment to ensure the effects of the variables of interest are not influenced by fluctuations in the control variables.

MCAT CONCEPT CHECK 11.3

Before you move on, assess your understanding of the material with these questions.

_________________________________

_________________________________

_________________________________

_________________________________

11.4 Ethics

LEARNING OBJECTIVES

After Chapter 11.4, you will be able to:

In medicine, there are four core ethical tenets: beneficence, or the obligation to act in the patient’s best interest; nonmaleficence, or the obligation to avoid treatments or interventions in which the potential for harm outweighs the potential for benefit; respect for patient autonomy, or the responsibility to respect patients’ decisions and choices about their own healthcare; and justice, or the responsibility to treat similar patients with similar care, and to distribute healthcare resources fairly.

BRIDGE

The key ethical tenets of medicine are also discussed in Chapter 11 of MCAT Behavioral Sciences Review in conjunction with an analysis of major institutions in the United States.

In research, these four principles are replaced by a slightly modified set. The Belmont Report, a landmark document published by the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research in 1979, delineates the three necessary pillars of research ethics: respect for persons, justice, and a slightly more inclusive version of beneficence.

Respect for Persons

Respect for persons includes the need for honesty between the subject and the researcher, and generally—but not always—prohibits deception. Respect for persons also includes the process of informed consent, in which a patient must be adequately counseled on the procedures, risks and benefits, and goals of a study to make a knowledgeable decision about whether or not to participate in the study. Further, the investigator cannot exert a coercive influence over the subjects, or they will not be acting autonomously. This coercive influence may be the result of a natural power imbalance, such as that between a teacher and a student, or may be the result of an extreme financial incentive or the inability to otherwise receive treatment for a condition. Respect for persons also includes the need to respect the subjects’ wishes to continue with or cease participation in a study. The subject may withdraw consent that was previously granted at any time.

Many older studies did not abide by respect for persons. For example, sentinel studies on the severity of untreated syphilis were conducted without the participants’ knowledge or consent. Many early psychological and sociological studies involved significant deception, which was not always disclosed after the fact. In current practice, hospital or university institutional review boards have put into place systematic protections against unethical studies. Vulnerable persons, which include children, pregnant individuals, and prisoners, require special protections above and beyond those taken with the general population. Confidentiality is also generally considered as part of respect for persons during research.

REAL WORLD

The Tuskegee syphilis experiment was a notorious forty-year study (1932–1972) by the United States Public Health Service that was fraught with extreme violations of the ethical principle of respect for persons. In this study, African American men living in conditions of poverty were enrolled into a study on the natural progression of syphilis. These men were given sham treatments, barred from accessing appropriate healthcare, and repeatedly deceived by investigators—including the fact that they were never told they had syphilis! This study was so significant in bioethical history that it is actually considered the primary impetus for the writing of the Belmont Report.

Justice

Justice in research applies to both the selection of a research topic and the execution of the research. If we lived in a world where all individuals and all questions were ethically the same, the only just way to determine the selection of a research question would be through random chance. However, we live in a world with morally relevant differences as established by our cultures. Morally relevant differences are defined as those differences between individuals that are considered an appropriate reason to treat them differently. For example, age is a significant moral difference in ethical deliberations: all else being equal, a transplant that is as likely to benefit a young child or an older adult might be given to the child because of a longer life expectancy. Population size is often morally relevant in study design because a study that impacts a large population will generally have more potential to do good than one that impacts a small population. In contrast, race, ethnicity, sexual orientation, gender identity, disability status, and financial status are generally not considered morally relevant differences. It should be noted that religion may or may not be a valid moral criterion, depending on the context; for example, certain interventions that are prohibited by a given religion are grounds to avoid that treatment in an individual of that religion—this is in keeping with patient autonomy.

Justice is also important in the selection of subjects and the execution of research. When there is risk associated with a study, it must be fairly distributed so as not to unduly harm any group. This generally corresponds to seeking a diverse group for a study. Note that this also has a side benefit of increasing external validity, which will be discussed later in the chapter. However, the population that is most likely to benefit from the study may be required to bear a greater proportion of the risk. This apparent discrepancy—that all individuals should equally share the burden of risk, and that the target population may assume a higher proportion of risk—is reflective of the fact that likelihood of benefit is a morally relevant difference between individuals. Therefore, in studies in which there is no perceived difference in the likelihood of benefit between individuals, all individuals should assume equal risk; in those for which a particular population is most likely to benefit, that population should assume a higher proportion of risk. Finally, be aware that in some cases, such as drug trials, it may necessary to test the intervention in individuals unaffected by the illness for which the drug has been designed. In this case, the burden of risk falls on a secondary population. This is permissible as long as the potential risks and benefits for the subjects have been addressed through informed consent and respect for persons has been maintained.

Beneficence

Beneficence is the easiest concept to understand in the context of research ethics. It must be our intent to cause a net positive change for both the study population and general population, and we must do our best to minimize any potential harms. This benefit may be as intangible as a feeling of personal satisfaction, and it may be unrelated to the original purpose of the study, such as small financial incentives. It may also be a future benefit if the participant is a member of the target population.

Research should be conducted in the least invasive, painful, or traumatic way possible. For example, a measurement that could be taken with either a finger stick or an indwelling catheter should be taken with a finger stick because it is far less painful and invasive. In addition, in studies comparing two potential treatment options, one cannot approach the research with the knowledge that one treatment is superior to the other. This is termed equipoise. If it becomes evident that one treatment option is clearly superior before a study is scheduled to finish, the trial must be stopped because providing an inferior treatment is a net harm.

MCAT CONCEPT CHECK 11.4

Before you move on, assess your understanding of the material with these questions.

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

11.5 Research in the Real World

LEARNING OBJECTIVES

After Chapter 11.5, you will be able to:

Until this point, we’ve been discussing research in a vacuum but our goals with any research are application-based. In order to apply the data generated, there are practical concerns that we must consider. For example, we must take into account the statistical strengths and weaknesses of a study, especially those that are related to the differences between the target population and the study sample. We also consider ways in which bias impacts the ability to use study conclusions in the real world, and whether there is any true justification for an intervention.

Populations *vs*. Samples

In statistics and research, we generally work with a sample rather than an entire population. A population is the complete group of every individual that satisfies the attributes of interest. Populations may be very large; for example, the population of humans is over seven billion people. In contrast, a population with a large number of qualifiers—for example, the population of females who live in the United States, are between 18 and 30 years old, and who have Darier’s disease, a rare skin condition—will be much smaller (in this case, about 100 people). Information that is calculated using every person in a population is called a parameter.

Working with a population is generally not feasible, even for smaller groups. Therefore, we make generalizations about populations based on sample data. A sample is any group taken from a population that does not include all individuals from the population. Ideally, samples will be representative of the population, and there are several methods of ensuring this. Random samples are generally considered the gold standard, although selecting for certain small subgroups may also be used. Information about a sample is called a statistic. With comparatively large or repeated samples, statistics can be used to estimate population parameters. If only a single small sample is taken, then very little information can be gleaned about the population.

Generalizability

When analyzing a study, we also look for markers of internal validity (or support for causality as discussed earlier) and external validity, or generalizability. Studies with low generalizability have very narrow conditions for sample selection that do not reflect the target population, whereas studies with high generalizability have samples that are representative of the target population. For example, a psoriasis study with low generalizability might have only participants who were diagnosed within the last year, while a study with high generalizability would have participants with a distribution of time since diagnosis that is similar to the population of all patients who have psoriasis.

REAL WORLD

Drugs undergo continuous evaluation in part because of poor preclinical generalizability. Some marketing changes or additional warnings may become necessary, or a drug may even be taken off the market. These are unforeseen risks or outcomes that only become apparent when the drug becomes available to the entire population.

Support for Interventions

As future doctors, we are interested in applying research to our patients. To do so, we’ll need to consider whether the data is sufficient for the recommendation or exclusion of any therapy or treatment plan.

Statistical *vs.* Clinical Effect

In research, the primary marker of success is being able to generate results that are statistically significant—that is, not the result of random chance. However, even the smallest difference between two treatments may be significant mathematically. For example, a decrease in systolic blood pressure of one millimeter of mercury could be statistically significant; however, it is not likely to change patient outcomes. In this way, we must assess whether there is clinical significance—a notable or worthwhile change in health status as a result of our intervention.

MCAT CONCEPT CHECK 11.5

Before you move on, assess your understanding of the material with these questions.

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

_________________________________

Conclusion

In this chapter, we focused on one of the four Scientific Inquiry and Reasoning Skills that will be tested on the MCAT: reasoning about the design and execution of research. We began by reviewing the scientific method and the value of historical data in the formulation of a research question. We then compared the methodology for both basic sciences research and human subjects research, especially with regard to error. We finished our investigation by examining the ethical and practical concerns in research design. The questions for this chapter are designed to allow you to practice this new skill, rather than to test your memorization of this content. In the next chapter, we’ll specifically work with data and graphical analysis, another of the Scientific Inquiry and Reasoning Skills that will be essential on Test Day.

GO ONLINE

You’ve reviewed the content, now test your knowledge and critical thinking skills by completing a test-like passage set in your online resources!

CONCEPT SUMMARY

The Scientific Method

Basic Science Research

Human Subjects Research

Ethics

Research in the Real World

ANSWERS TO CONCEPT CHECKS

**11.1**

-

**11.2**

-

pH vs. NaOH added; curve is S-shaped from origin to upper right corner of plot

**11.3**

**11.4**

**11.5**

SCIENCE MASTERY ASSESSMENT EXPLANATIONS

1. C

The purpose of a control is to keep the conditions of two experiments as close as possible to establish causality. In this case, the one milliliter volume addition might have impacted the growth of E. coli; thus, we must control for this by administering an equal volume of a theoretically inert compound to a plate of E. coli.

2. C

An experiment will always establish a clearer causal link than an observational study. (A), (B), and (D) are all examples of observational data.

3. A

The experimenter has not completed the initial phases of research. There was no data acquisition or refinement, and there was no indication that the question required an experiment to be answered. Were the experimenter doubtful of the validity of the reported value, an experiment could be appropriate—but there is no information to indicate that this is so. Based on the question stem, it is clear that the experimenter had a clear goal, eliminating (B). In human subjects research, tasks may be divided to facilitate blinding, but this is generally unnecessary in basic sciences research, eliminating (C). As long as the calorimeter was calibrated prior to its use, it does not matter when this calibration occurred relative to the synthesis of the compound, eliminating (D).

4. B

A parameter is a population measure, so to calculate it, every single member must be measured. Identifying, measuring, and recording data for a population that large—over 300 million—is essentially impossible. Common biometric measures, if not misused, generally neither cause significant psychological harm nor are unethical, eliminating (A). Knowing a mean weight could have major ramifications, including public health measures, medical recommendations, and shifting of body image, eliminating (C). While a number of studies on weight have been performed, there is not yet a parameter describing the entire population, eliminating (D).

5. D

Because both the exposure and outcome are measured at the same time, we cannot make any conclusions about temporality. The cancer patient may have only begun smoking after diagnosis, and this type of study doesn’t examine that possibility.

6. B

Randomization is based on the idea that the results will only vary as a result of random chance as long as the assignment is proper. The appropriate response to a fair algorithm that assigns groups in an unexpected way is to proceed with the research. Participants should never be assigned by the researcher, nor should one continue randomizing samples to achieve a desired outcome—these are likely to introduce more error than leaving unequal groups, eliminating (A) and (D). It is unnecessary to drop this entire cohort, assuming the randomization algorithm was fair, eliminating (C).

7. C

Two relationships are being assessed. In the relationship between smoking and very low birth weight (VLBW), smoking is the independent variable and VLBW is the dependent variable. In the second relationship, VLBW is being compared to IQ. VLBW is the independent variable here, while IQ is the dependent variable.

8. A

Because the same physician sees both the control groups and the experimental groups, there is the potential for the physician to realize which of the groups is receiving which treatment—especially if the subject mentions expected side effects. In this study, patients were not told which group they were in, but if they were assigned to the medication group, they were told about its side effects. If the patients talked to each other they could experience patient unblinding, but we are told they have no communication with each other.

9. B

As part of the scientific method, after formulating a testable question, one must search through journals and databases to review the available information. This student likely did not spend sufficient time reviewing existing studies because the review committee was able to cite several studies that had already tested and affirmed the student’s hypothesis, meaning that the hypothesis is not scientifically interesting. This observation is consistent with (B). On the other hand, the question stem indicates that the hypothesis was well formulated and that the student had some preliminary data, which eliminates (A) and (D). There is no stated criticism of the student’s research methods, which eliminates (C).

10. B

Requiring subjects to volunteer for a study and to seek the study out will introduce selection bias. The people who end up volunteering listen to the radio, which the general population may not, and are interested in the topic and willing to volunteer. Most studies suffer from selection bias and it is the most common impediment to generalizability.

11. C

The behaviors described in the question stem (informing the patient, providing time to make a decision) are consistent with informed consent and autonomy, which are both part of respect for persons.

12. A

While it may appear that this question is asking to determine which method is the most accurate or reliable, this is an ethics question. All of these methods (with the exception of venipuncture, (C)) measure oxygen saturation. According to the principle of beneficence, we must minimize potential harms associated with our investigations; thus, the noninvasive pulse oximeter should be greatly favored over other measurements for an initial assessment.

13. B

Statistical significance is not the same as clinical significance. There are medications that increase the length of pregnancy in preterm labor, but some only do so for a few hours. Because this may not impact patient outcomes, it should not inform treatment decisions.

14. D

Samples should always be taken from the target population or population of interest. Given that the target population is postmenopausal women, this group should be used for the sample. Both prepubescent girls and pregnant women are populations with special precautions against coercion, and are not appropriate for this study, eliminating (A) and (C). Premenopausal women are unlikely to require hormone replacement therapy unless they have a condition that specifically necessitates it, eliminating (B).

15. C

The Hawthorne effect—a change in behavior as a result of the knowledge that one is being observed—is only present with human subjects. Basic science research generally does not suffer from the Hawthorne effect.

GO ONLINE

Consult your online resources for additional practice.

SHARED CONCEPTS

Behavioral Sciences Chapter 4

Cognition, Consciousness, and Language

Behavioral Sciences Chapter 8

Social Processes, Attitudes, and Behavior

Behavioral Sciences Chapter 11

Social Structure and Demographics

Behavioral Sciences Chapter 12

Social Stratification

Physics and Math Chapter 12

Data-Based and Statistical Reasoning

← Mathematics All chapters Data-Based and Statistical Reasoning →