Unlocking AI: Fundamentals for Clinicians and Leaders

"He who loves practice without theory is like the sailor who boards a ship without a rudder and compass and never knows where he may cast." -Leonardo Da Vinci

May 28, 2025

The endeavor to comprehend and conquer cancer has perpetually been one of deciphering intricate codes. From the pioneering pathologists who first squinted through microscopes at the bewildering cellular hieroglyphs of malignancy, to the geneticists who unlaced the elegant double helix and the subsequent mutations that propel a cell’s dark journey towards oncogenesis, our progress has been sculpted by an ever-sharpening ability to read and interpret the body’s complex biological narratives. Yet, we now face a deluge of information so vast, so multifaceted, that its sheer volume threatens to surpass our innate human capacity for synthesis. It is into this arena, a landscape awash with data yet often obscured by its overwhelming density, that a novel analytical partner arrives: Artificial Intelligence.

Our conceptual dalliance with machines designed to augment human intellect can be traced, perhaps, to the urgent, almost rhythmic clatter of gears and the focused intensity of wartime code-breakers at Bletchley Park, where Alan Turing played his “Imitation Game”. There, the labyrinthine complexity of the Enigma cipher demanded more than unaided human logic; it compelled the creation of mechanical collaborators in thought. This ambition, to breathe a semblance of intelligence into silicon, was formally ignited in the heady, optimistic summer of 1956 at Dartmouth College. This seminal event launched decades of fervent exploration—a saga marked by both monumental advances and periods of humbling disillusionment, yet always fueled by an enduring human fascination with the prospect of intelligent machines.

Today, as artificial intelligence migrates from the rarefied atmosphere of specialized laboratories and theoretical conjectures into the critical, life-altering theatre of cancer care, a foundational grasp of its core principles ceases to be a mere academic pursuit. It transforms into a professional imperative, an essential new literacy for the modern clinician, the innovative researcher, and the visionary scientific leader.

These emerging technologies are not simply instruments in the familiar way a sharper scalpel or a scanner with greater resolution extends our existing capabilities. Those are sophisticated extensions of our senses, our hands. AI, particularly in its more advanced incarnations, offers something qualitatively different: an analytical entity capable of learning from experience, discerning patterns too subtle or complex for human perception, and making predictions from data at a scale and velocity that far outstrip our cognitive limits. It holds the promise of helping us navigate the bewildering informational maze of contemporary oncology—the torrent of genomic data, the nuanced shadows in our imaging studies, the subtle inflections in sprawling clinical histories. It offers the potential to perceive the faintest signals in this pervasive noise, those early, almost inaudible whispers of malignancy or harbingers of drug response that we currently, and often tragically, miss. Ultimately, the ambition is to augment our ability to deliver care that is not only more effective but also profoundly personalized and deeply compassionate.

But to wield these potent tools with wisdom and responsibility, to accurately discern their genuine promise from the hype that surrounds them, and to critically appraise their inevitable limitations, we must first achieve a comfortable fluency in their fundamental language and logic. This essay aims to demystify these core concepts. It is an invitation to understand the intellectual heart of these algorithms, not with the granular detail of a computer scientist, but with the practical insight and critical discernment required by a physician, a researcher, a guardian of patient well-being—always with a steadfast eye toward the humanistic core of medicine, a core that no algorithm, however sophisticated, can ever replace.

Defining the Landscape: A Taxonomy of Intelligent Machines

The term Artificial Intelligence (AI) itself casts a wide net, encompassing any computational system engineered to perform tasks that, were they performed by a human, would be considered to rely on intelligence. This is a broad designation, covering everything from the focused logic of a chess-playing program to the multifaceted reasoning involved in medical diagnosis, intricate problem-solving, learning from accumulated experience, understanding the nuanced tapestry of human language, or perceiving and interpreting the surrounding environment.

At its philosophical core, AI embodies humanity's enduring aspiration to create systems capable of processing information and making decisions in ways that mimic, significantly augment, or, in certain circumscribed domains, even surpass particular human cognitive functions. It is vital, however, to maintain a clear conceptual distinction. While the tasks AI performs may mirror those of human intelligence, the underlying mechanisms—constructed from the rigorous logic of mathematics, statistics, and computational algorithms—are fundamentally alien to the intricate, evolved, and deeply embodied nature of biological cognition. An AI that expertly identifies a tumor on a CT scan does not "see" or "understand" the radiological features of cancer in the human sense; it recognizes complex statistical patterns within the pixel data that it has been meticulously trained to associate with the label "tumor."

Within this expansive territory of AI resides Machine Learning (ML), a pivotal sub-discipline that imbues AI with much of its contemporary dynamism and versatility. Traditional computer programming can be likened to providing a chef with an extraordinarily detailed recipe: every step, every ingredient, every precise quantity is explicitly dictated, and the program executes these instructions with unwavering fidelity. Machine learning, by contrast, operates more like an apprentice chef. This apprentice learns not by committing thousands of recipes to memory, but by keenly observing a master at work, by tasting and analyzing countless finished dishes, and by gradually discerning the subtle, underlying principles of flavor, texture, and culinary technique. We do not furnish the ML algorithm with an exhaustive compendium of explicit rules for every conceivable scenario; instead, we provide it with a rich trove of "ingredients" (data) and a multitude of examples of "finished dishes" (known outcomes). The algorithm then iteratively refines its own internal model—its operational understanding, if you will—of how to transform those ingredients into the desired dish. It learns to approximate the recipe itself by identifying the complex statistical relationships and hidden patterns within the vast dataset.

Consider the contrasting educational paths of a medical student and the programming of a simple diagnostic checklist, such as one used to identify a common ailment like strep throat. The checklist is unyieldingly rigid, adhering to predefined rules: if symptom A and sign B are present, then conclude X. The medical student, however, embarks on a journey of learning through exposure to an immense and varied caseload—their "data." They encounter classic presentations, atypical variants, confounding conditions, and deceptive mimics. Through this immersive experience, guided by the wisdom of seasoned clinicians, they cultivate a nuanced diagnostic intuition, an ability to weigh probabilities with finesse, to adapt to incomplete or ambiguous information, and to recognize subtle patterns that defy simplistic, rule-based codification. Machine learning algorithms, particularly when applied to the complex, multifactorial challenges of oncology, operate in a manner more analogous to this student. They learn from data to construct sophisticated predictive models, albeit through the dispassionate, logical rigor of mathematical optimization rather than the intricate, biological ballet of neural networks and lived human experience.

Venturing a level deeper into this conceptual architecture, we meet Deep Learning (DL). This is a specialized and particularly potent branch of machine learning that employs computational structures known as artificial neural networks, characterized by multiple layers—the "depth" in deep learning refers directly to this layered, hierarchical architecture. These networks, although loosely and metaphorically inspired by the brain's intricate web of neurons, have demonstrated an extraordinary capacity for automatically discovering complex patterns and hierarchical features within large, high-dimensional datasets. Imagine the rich, almost infinite visual information contained in a single digital pathology slide, the dense, coded narrative of a tumor's genome, or the subtle, fluctuating temporal patterns in a patient's continuous physiological readings.

While earlier machine learning techniques often required human experts to meticulously pre-process the data and manually engineer the most relevant features for the algorithm to scrutinize—a labor-intensive, time-consuming, and usually incomplete endeavor—deep learning models can frequently learn these crucial, distinguishing features directly and automatically from the raw data. This inherent ability to self-discover the most informative representations within the data is a cornerstone of their remarkable success in tackling highly complex perceptual and analytical tasks that were, until recently, mainly considered intractable for machines.

To crystallize this conceptual hierarchy: AI represents the overarching ambition, the grand societal and scientific project of imbuing machines with capabilities we recognize as intelligent. Machine Learning is a dominant and highly effective method for realizing this ambition, empowering systems to learn from data rather than relying solely on explicit programming. Deep Learning is a sophisticated and powerful type of machine learning that leverages multi-layered neural networks, which have proven particularly adept at recognizing complex patterns in rich, nuanced datasets. This understanding has driven many of the recent breakthroughs that have propelled AI to the forefront of biomedical research and serious clinical consideration. Grasping these distinctions is the crucial first step towards critically dissecting any claim about "AI in cancer care," enabling us to ask pertinent questions: Is this genuine learning, or merely sophisticated automation? Is it deep learning, and if so, what does that imply about its appetite for data and the transparency of its decision-making processes?

How Machines Learn: Paradigms of Algorithmic Apprenticeship

Just as human learning unfolds through diverse modalities—such as rote memorization, guided instruction, immersive experiences, solitary exploration, and iterative trial and error—machine learning algorithms also utilize distinct "paradigms" or approaches to learning. Recognizing these fundamental strategies is key to understanding how different AI tools are constructed, the nature of the data they require, and the types of problems for which they are best suited within the intricate domain of oncology.

Supervised Learning: The Algorithm as a Diligent Student

This is arguably the most prevalent and intuitively understood paradigm in medical AI. It is analogous to a student learning under the tutelage of a dedicated teacher or meticulously studying a set of expertly labeled flashcards. In supervised learning, the algorithm is trained on a dataset where each piece of input data, referred to as an "example," is paired with a known, correct answer or "label."

Imagine, for instance, an algorithm being fed thousands of mammogram images. Each image has been rigorously reviewed and labeled by expert radiologists as either "cancer present" or "no cancer present." Alternatively, it might learn from a vast dataset of digitized pathology slides, each meticulously tagged by pathologists with precise tumor grades and subtypes. Similarly, it could analyze extensive patient records where clinical features, genomic markers, and treatment histories are all labeled with actual patient outcomes, such as "responded to immunotherapy" or "developed resistance within 12 months."

The algorithm's core task is to discern the underlying mapping function—that is, to identify the subtle constellation of features within the input data (be it image pixels, molecular signatures, or clinical variables) that reliably correlate with, and can therefore predict, the correct output label. It iteratively adjusts its internal parameters, constantly guided by the supervision of these correct labels, to minimize its prediction errors. Once this training phase is complete, the goal is for the model to predict the labels for new, previously unseen examples accurately.

Oncology Relevance: Supervised learning is the veritable workhorse behind many AI applications currently making inroads into oncology. It excels at diagnostic tasks such as classifying radiological images (e.g., identifying suspicious pulmonary nodules on a chest CT scan), interpreting histopathological slides (e.g., assisting in the grading of tumors or counting mitotic figures), predicting patient prognosis based on a complex amalgamation of clinical and molecular features, or stratifying patients by their likelihood of responding to a specific therapeutic agent based on patterns learned from historical data.

Key Requirement & Achilles' Heel: The success of supervised learning is inextricably linked to the availability of large quantities of high-quality, accurately labeled data. This prerequisite can often represent a formidable bottleneck in healthcare settings. Assembling these "gold standard" labeled datasets is frequently a time-consuming, expensive endeavor that demands significant, specialized domain expertise. Furthermore, the old computing maxim "garbage in, garbage out" applies with unforgiving rigor: any errors, biases, or inconsistencies present in the labels provided during the training phase will be faithfully learned by the algorithm and subsequently propagated into its future predictions, potentially leading to serious clinical consequences. The quality of the "supervision" directly and inescapably dictates the quality of the learning.

Unsupervised Learning: The Algorithm as an Uncharted Explorer

In stark contrast to its supervised counterpart, unsupervised learning can be likened to an intrepid explorer venturing into an unknown territory without the aid of a map or guide, or perhaps a biologist meticulously observing a newly discovered ecosystem, seeking to discern its inherent structures, relationships, and hidden rules. In this paradigm, the algorithm is presented with a dataset that lacks any predefined labels or correct answers. Its task is to navigate this unlabeled data and, entirely on its own, discover latent patterns, natural groupings, or underlying organizational structures. It might achieve this by grouping similar data points into "clusters" (a process known as clustering) or by identifying unusual data points that significantly deviate from the established norm (anomaly detection).

Imagine, for example, providing an algorithm with the complete genomic expression profiles derived from a thousand different tumor samples, without any pre-existing classification. An unsupervised learning algorithm might sift through this immense and complex dataset and identify, let's say, three distinct clusters of tumors that share common, overarching gene expression signatures—signatures that perhaps were not previously recognized by human researchers as defining specific, clinically relevant subtypes.

Oncology Relevance: Unsupervised learning holds immense promise for fostering discovery within oncology. It can be employed to uncover previously unrecognized cancer subtypes based on high-dimensional molecular data (e.g., clustering patients based on their proteomic, metabolomic, or transcriptomic profiles), which could, in turn, lead to the identification of novel therapeutic targets or more refined prognostic categories. It can help identify distinct patient subgroups that exhibit unique clinical trajectories or differential responses to treatment, even if those groupings do not align neatly with our current, established nosology. In the realm of medical imaging, it might highlight unusual textural patterns within a tumor or its surrounding microenvironment that correlate with aggressiveness but are too subtle or complex for consistent human detection. It serves as a potent hypothesis-generating tool, helping us perceive an inherent order within complex biological data that may not readily fit into our existing, human-derived classifications.

Key Feature & Challenge: The signal advantage of unsupervised learning is its independence from the laborious and often costly process of data labeling. This makes it invaluable in situations where labels are scarce or non-existent, when the underlying categories within the data are unknown, or when the primary objective is pure discovery rather than prediction based on pre-established classes. The challenge, however, often lies in interpreting the patterns it uncovers. While an algorithm may identify statistically valid and robust clusters, their ultimate clinical or biological significance typically requires careful human interpretation, domain expertise, and additional experimental validation.

Reinforcement Learning: The Algorithm Learning from Consequences

This learning paradigm mirrors a fundamental way in which humans and many animals learn: through a process of trial and error, guided by the consequences, positive or negative —of their actions. In reinforcement learning, the algorithm, often referred to as an "agent," learns to make a sequence of decisions within a defined "environment" with the overarching goal of maximizing a cumulative "reward" over an extended period. It is not explicitly told which action to take in each specific situation; instead, it experiments by trying out different actions and receives feedback from the environment. This feedback arrives in the form of positive rewards (for actions that move it closer to a desired goal) or negative rewards, sometimes termed penalties (for actions that are detrimental or lead it astray from the objective). Through multiple cycles of interaction, exploration of new strategies, and refinement of previously learned successful approaches, the agent gradually develops an optimal "policy"—a sophisticated method for selecting actions that are most likely to yield the most significant long-term cumulative reward.

Think of teaching a dog a new trick, like "sit" or "stay." A small treat and praise (a reward) for a correct response reinforces that behavior. A lack of treat or a gentle corrective sound (a mild penalty) for an incorrect response discourages that pathway. Over time, through these iterative feedback loops, the dog learns the sequence of actions that maximizes the likelihood of receiving treats.

Oncology Relevance: Reinforcement learning is perhaps the most nascent of these three paradigms in terms of direct clinical application in oncology, but it holds tantalizing potential for optimizing complex, dynamic treatment strategies. For instance, it could learn the optimal sequence and timing for administering different chemotherapy drugs in a combination regimen to maximize tumor control while minimizing toxicity. It might also learn how to dynamically adjust the dosage of a targeted therapy based on evolving biomarkers of response and emerging signs of resistance or side effects. In the field of radiation oncology, it could learn to guide adaptive radiotherapy, making real-time, data-driven adjustments to treatment plans based on observed changes in tumor size, shape, and position during a course of therapy. The overarching goal is to personalize and optimize therapeutic journeys over time, adapting to the patient's individual and evolving biological landscape.

Key Challenge & Ethical Considerations: The primary and most significant challenge is that direct trial-and-error learning on actual patients is, in the vast majority of oncological contexts, ethically unacceptable and practically unfeasible. Therefore, the application of reinforcement learning in healthcare often relies heavily on the development of high-fidelity in silico environments. These are sophisticated computer simulations of disease progression, treatment response, and physiological interactions. Alternatively, it may involve learning from large, retrospective datasets that meticulously detail diverse treatment trajectories and their associated long-term outcomes. Another complex task is defining appropriate reward functions that truly and comprehensively encapsulate desired clinical outcomes, such as balancing effective tumor control with the preservation of quality of life and the minimization of long-term treatment-related toxicities. This requires deep clinical insight, careful ethical deliberation, and often, a multidisciplinary consensus.

A clear, conceptual understanding of these fundamental learning paradigms is essential for any clinician or researcher seeking to engage meaningfully with AI. When presented with a new AI tool or a research paper describing an AI application, asking whether it was developed using supervised, unsupervised, or reinforcement learning provides immediate and valuable insight into its underlying data dependencies, its likely operational strengths, the types of problems it is designed to address, and the critical questions one should pose regarding its validation, interpretation, and potential limitations.

A Glimpse Inside the Algorithmic Mind: The Essence of Neural Networks

Much of the current fervor and many of the breakthroughs in AI, particularly within deep learning, are propelled by the remarkable capabilities of artificial neural networks (ANNs). While their name and characteristic layered structure are loosely inspired by the intricate biological neural networks of the human brain, clinicians need to understand that ANNs are, at their core, sophisticated yet vastly simplified mathematical constructs. They are powerful engines for approximating complex functions and recognizing subtle patterns, not sentient entities possessing consciousness or human-like understanding.

For our purposes, imagine a network of interconnected processing units. In AI parlance, these are often called nodes or, more evocatively, "neurons," and they are typically organized in distinct layers:

Input Layer: This is the conduit through which information enters the network. It receives the raw data that the network will process. For example, suppose the AI is designed to analyze a digital pathology image. In that case, each node in the input layer might correspond to the brightness or color value of a single pixel. If the task involves analyzing genomic data, each node might represent the expression level of a specific gene.
Hidden Layers: Situated between the input and output layers are one or more (and in "deep" networks, often many) hidden layers. These form the computational heart of the network. Each "neuron" in a hidden layer receives signals from many neurons in the preceding layer. Crucially, these incoming signals are not treated equally; each connection between neurons has an associated "weight." This numerical value represents the strength or importance of that particular connection, a concept loosely analogous to the synaptic strength that modulates signal transmission between biological neurons. The neuron sums these weighted inputs. This aggregated sum is then typically passed through an "activation function"—a simple mathematical rule that determines the neuron's output signal, which is then transmitted to neurons in the subsequent layer. It is through the meticulous, iterative adjustment of these millions (sometimes billions) of connection weights during the training process that the neural network "learns" to perform its designated task, such as classifying an image or predicting an outcome. The details of how these weights are adjusted (e.g., via algorithms like backpropagation) are beyond our clinical scope, but the concept of learning through tuning connection strengths is key.
Output Layer: This final layer produces the network's ultimate result. Depending on the specific task the AI is designed for, this output might be a single numerical value (e.g., the probability that a detected lung nodule is malignant), a set of probabilities assigned to different classes (e.g., classifying a tumor into one of five known molecular subtypes), or even a more complex, structured output like a generated sentence or a delineated region of interest on an image.

The "depth" in deep learning, as previously noted, refers to the presence of multiple hidden layers. This architectural feature is not arbitrary; it allows the network to learn features from the data in a hierarchical fashion, building up increasingly complex representations from simpler ones. This is particularly evident and intuitive in image analysis. The first hidden layer, which processes raw pixel data, may learn to detect very simple visual primitives, such as edges, corners, or basic textures. Neurons in the next layer might then combine these rudimentary detections to recognize slightly more complex elements, such as simple geometric shapes or recurring motifs. Subsequent layers can build upon these, combining shapes into more intricate object parts, such as glandular structures in a prostate biopsy or the characteristic morphology of specific cell nuclei. Finally, the deepest layers might integrate these complex parts to recognize overarching patterns or objects of interest, such as a particular tumor architecture, the presence of invasive fronts, or the density of tumor-infiltrating lymphocytes. This automatic, data-driven discovery of a hierarchical cascade of increasingly complex and relevant features is a fundamental reason for the extraordinary success of deep learning in perceptual tasks involving rich, high-dimensional data, such as medical images.

For clinicians, it's helpful to know that specific types of neural network architectures are designed for particular data. Convolutional Neural Networks (CNNs) are the stars in image analysis (radiology, pathology) because they are adept at recognizing spatial patterns, much like a radiologist scans an image for tell-tale signs. Transformer models excel with sequential data, such as text or genomic sequences, because they can understand context and long-range relationships within the sequence, much like a geneticist interprets the meaning of a long DNA strand. Understanding these broad specializations helps in appreciating why specific AI tools are used for certain tasks, without needing to delve into the engineering specifics of "convolutional filters" or "attention mechanisms." The key is that the architecture is tailored to the data, influencing performance and how we might interpret its outputs.

The Algorithmic Appetite: The Indispensable Role of Data

If neural networks and learning algorithms represent the "cognitive machinery" of AI, then data is undoubtedly its sustenance, its experiential education, its primary teacher. Machine learning, and deep learning most emphatically, are fundamentally data-driven endeavors. An algorithm discerns patterns, correlations, and statistical regularities solely from the examples it is shown. Consequently, the characteristics of the training data—its quality, quantity, diversity, and representativeness—are not merely important; they are paramount, profoundly sculpting the resulting AI system's capabilities, its reliability, and, critically, its ethical implications for patient care.

Volume: The Insatiable Thirst for Examples

Deep learning models, with their vast number of learnable parameters— the connection weights within their neural networks, often numbering in the millions or even billions — typically require enormous amounts of data to be trained effectively. Just as a medical student must see a great many diverse cases to develop robust and reliable diagnostic skills, a deep learning algorithm needs exposure to a massive corpus of examples to learn the subtle and complex patterns that characterize a multifaceted disease like cancer. This is especially true if the AI is to generalize well—that is, perform accurately on new, unseen cases encountered in real-world clinical practice. Insufficient data can lead to a problem known as "overfitting," where the model essentially memorizes the training examples, including their inherent noise and idiosyncrasies, but fails to learn the underlying, generalizable principles. Such an overfitted model may perform brilliantly on the data it was trained on, but will likely fail when faced with new patients.

Quality: The Unforgiving Peril of "Garbage In, Garbage Out"

The integrity of the data used for training is non-negotiable. Errors, inaccuracies, inconsistencies, or ambiguities within the training dataset—such as mislabeled images, incorrectly recorded clinical parameters, or incomplete or erroneous outcome data—will not be magically identified and corrected by the learning algorithm. Instead, these flaws will be faithfully learned and incorporated into the model's internal logic. Suppose an algorithm is trained to identify cancer on pathology slides using a dataset where a significant number of benign samples were erroneously labeled as malignant. In that case, the resulting AI system will almost certainly exhibit an unacceptably high false positive rate, leading to unnecessary anxiety and follow-up procedures for patients. This simple but profound principle, often succinctly summarized by the adage "garbage in, garbage out," underscores the absolute necessity for meticulous data curation, rigorous quality control processes, and expert validation of any dataset used to train medical AI systems.

Representation & Bias: The Achilles' Heel of Medical AI

This is arguably the most critical, complex, and ethically charged challenge in the application of AI to healthcare. An AI model is, in essence, a sophisticated reflection of the data upon which it was trained. Suppose that data does not accurately, comprehensively, and equitably represent the full diversity of the patient population in which the AI system will ultimately be deployed. In that case, the system is highly likely to exhibit bias. This means it may perform differently—and often significantly worse—for underrepresented groups, potentially leading to the perpetuation, or even exacerbation, of existing health disparities.

Bias can infiltrate datasets through myriad paths. Historical bias reflects long-standing societal inequities that are often unconsciously embedded in historical data collection practices; for example, specific populations may have had less access to advanced diagnostic imaging or genomic testing, leading to their underrepresentation in relevant medical datasets. Measurement bias can occur if data is collected or measured differently or with varying degrees of accuracy across different groups (e.g., some studies have suggested that pulse oximeters may show different accuracy levels on individuals with darker skin pigmentation). Labeling bias can arise if the human expert annotators who label the data (e.g., radiologists identifying tumors, pathologists grading slides) carry their own unconscious biases, which then become part of the "ground truth" the AI learns from. Even the learning algorithm itself can sometimes introduce or amplify bias if it inadvertently learns to associate sensitive attributes (like race or socioeconomic status, even if these are not explicitly provided as input features) with outcomes due to spurious or unfair correlations present in the training data.

A stark and often-cited example is an AI algorithm trained to detect skin cancer primarily using a dataset composed of images from fair-skinned individuals. Such an algorithm might perform significantly less accurately when applied to patients with darker skin tones, potentially missing melanomas or other lesions because it has not adequately learned the different visual manifestations of these conditions in more pigmented skin. Similarly, an AI algorithm designed to predict treatment response based on genomic data might be less accurate for individuals from ancestral backgrounds that were underrepresented in the large genomic databases used for its training.

Addressing and mitigating bias is an active, urgent, and ongoing area of research and policy development. It necessitates a multi-pronged approach: conscious and sustained efforts to curate diverse, representative, and equitably sourced datasets; rigorous auditing and testing of AI models for performance disparities across different demographic, clinical, and social subgroups; the development of new "fairness-aware" algorithms designed to reduce bias; and continuous post-deployment monitoring to detect and correct emergent biases as AI systems are used in the real world. For clinicians and healthcare leaders, critically understanding the provenance of the data used to train any AI system—where did it come from, who does it represent, what are its inherent limitations, and what potential biases might it contain—is crucial for the responsible evaluation, adoption, and ongoing oversight of these powerful new technologies.

Typically, the data painstakingly accumulated for developing an AI model is carefully partitioned into distinct sets. An extensive training set is used to "teach" the model, allowing it to learn the relevant patterns and adjust its internal parameters. A separate validation set (sometimes referred to as a development set) is used periodically during the development process to fine-tune the model's architecture and other settings that are not directly learned from the data but are set by the developers. Finally, and most importantly, an independent test set—data that the model has never encountered during either training or validation—is used to provide a final, unbiased assessment of its performance and its ability to generalize to new, unseen cases. This rigorous, multi-stage evaluation process is essential to ensure that the model has truly learned robust, generalizable patterns and has not merely "memorized" or overfitted to the specific idiosyncrasies of the data it was trained on.

Understanding the Output: The Case of Large Language Models (LLMs)

Among the most striking, widely discussed, and rapidly evolving AI advancements in recent years are large language models (LLMs). These are the sophisticated systems that power advanced chatbots, AI assistants, and a new generation of text analysis and generation tools (famous examples include models in the Open AI GPT family, Google's Gemini, and Anthropic’s Claude). These models, typically built using the Transformer neural network architecture we discussed earlier, are trained on a vast corpus of text and computer code, often scraped from the vast expanse of the public internet, digitized books, scientific articles, and other extensive textual repositories.

Through this massive exposure to human language in all its diversity and complexity, LLMs learn incredibly intricate statistical patterns—grammar, syntax, semantics, context, conversational flow, and even stylistic nuances. They can understand user prompts with remarkable acuity, generate text that is remarkably fluent, coherent, and contextually relevant, translate languages with impressive accuracy, summarize lengthy and complex documents, answer questions (to varying degrees of accuracy), and even engage in tasks that appear creative, such as writing poetry, drafting emails, or generating computer code.

In the context of healthcare and oncology, LLMs hold considerable, though still largely developing, potential. They could be leveraged for tasks such as:

Summarizing complex patient histories: Distilling lengthy electronic health records or multi-specialty consultation notes into concise, relevant summaries for busy clinicians.
Assisting with clinical documentation: Helping to alleviate the significant documentation burden by generating initial drafts of progress notes, referral letters, or discharge summaries based on structured data input or clinician dictation (always requiring human review and finalization).
Facilitating patient communication (with stringent oversight): Assisting in drafting patient education materials or providing preliminary answers to common, non-urgent patient questions in an accessible way, always under the direct supervision and final approval of a human healthcare professional.
Supporting research endeavors: Helping to synthesize information from vast archives of medical literature, identify relevant published papers for a specific research question, or even, speculatively, assist in generating hypotheses by identifying novel connections or patterns across disparate areas of published research. AI-powered research tools, for instance, can be valuable in synthesizing information for comprehensive analyses and literature reviews.

Their ability to process, understand, and generate natural human language at scale could significantly reduce administrative burdens, improve the efficiency of information retrieval within healthcare systems, and facilitate clearer, more effective communication among care team members and with patients.

However, alongside this exciting promise, LLMs come with significant and inherent limitations that are especially critical to recognize and respect in the high-stakes, safety-critical environment of medicine. Their "knowledge" is entirely derived from the statistical patterns and relationships present in their vast training data. This extensive data inevitably contains inaccuracies, biases, outdated information, and even outright falsehoods and misinformation.

Crucially, it must be understood that LLMs do not possess true understanding, consciousness, self-awareness, or the capacity for genuine reasoning or critical thought in the human sense. They are exceptionally sophisticated pattern-matching and sequence-prediction engines. They excel at predicting what word, or sequence of words, is most statistically likely to follow a given input or context, based on the patterns they have learned.

This operational mechanism can lead to a well-documented phenomenon known as "hallucinations" or confabulations. In this context, a hallucination occurs when the model generates statements that are plausible-sounding, grammatically correct, and often contextually appropriate, but are factually incorrect, nonsensical, or entirely fabricated. An LLM might confidently assert an incorrect drug dosage, mischaracterize a medical condition or its prognosis, cite non-existent medical studies to support its claims, or generate a flawless-sounding but entirely fictional patient case history. Because these models are designed to create convincing and fluent text, these fabrications can be dangerously persuasive, especially to a non-expert or an unwary user. These confabulations may be akin to the overly confident medical student who tells confident falsehoods on rounds, with similar perils.

Therefore, relying on the raw, unverified output of current large language models for critical clinical decision-making, diagnosis, or treatment planning is profoundly unsafe and unethical. Understanding that LLMs generate text based on learned probabilities of word sequences, rather than by accessing a verified, curated knowledge base or engaging in actual logical deduction, is paramount to using them responsibly. Significant ongoing research is focused on mitigating these risks. Techniques such as Retrieval-Augmented Generation (RAG) aim to improve the factual grounding of LLM responses by prompting the model to base its answers primarily on information retrieved in real-time from specific, curated, and trusted document sources (such as up-to-date medical textbooks, peer-reviewed clinical guidelines, or institutional protocols) rather than relying solely on its generalized, pre-existing training. Other approaches involve fine-tuning LLMs on high-quality, domain-specific medical data and incorporating more robust mechanisms for uncertainty estimation and confidence scoring in their outputs.

For the practicing oncologist, LLMs represent a powerful and rapidly evolving new class of tools, but one that demands an exceptionally cautious, critical, and discerning approach, always prioritizing patient safety, evidence-based practice, and the irreplaceable role of human clinical judgment and ethical oversight.

Conclusion: Towards an Informed Partnership with Algorithmic Intelligence

This exploration of the foundational concepts of artificial intelligence—from its broad definition to the nuances of machine learning paradigms, the essential architecture of neural networks, the critical dominion of data, and the specific characteristics of powerful new tools like Large Language Models—reveals a suite of technologies grounded in sophisticated mathematical and statistical principles. These are systems powered by data, capable of learning intricate patterns and making predictions in ways that both echo and diverge fundamentally from the pathways of human cognition. We have seen how different learning approaches (supervised, unsupervised, reinforcement) are tailored to distinct classes of problems and how the quality, quantity, and representative nature of the data profoundly sculpt an AI system's capabilities, its reliability, and its inherent potential for bias.

The aim of this exploration has not been to transform clinicians into AI engineers or computer scientists. Instead, it has been to cultivate a crucial form of AI literacy—a foundational understanding that enables informed engagement. This literacy is the bedrock upon which a robust, critical, and ultimately fruitful partnership between human medical expertise and artificial intelligence can be built. It equips us to move beyond the often-bewildering jargon and the cycles of hype and disillusionment, empowering us to ask the penetrating, essential questions necessary when evaluating any AI tool proposed for use in cancer research or clinical care:

How precisely was this system trained? What specific learning paradigm was employed in its development?
Upon what data was it trained? What was the source, size, demographic, and clinical makeup of this dataset? How was data quality ensured and validated?
How, and on what truly independent datasets, was its performance rigorously validated? Were comprehensive tests for fairness and potential bias across different relevant patient populations conducted and reported?
What are its known limitations, its potential failure modes, and the specific circumstances or contexts under which its predictions or outputs might be unreliable or misleading?
Does the system incorporate transparent safeguards to mitigate the risk of bias, or could it inadvertently perpetuate or even exacerbate existing health disparities?
How is its output intended to be integrated into existing clinical workflows? What is the clearly stipulated role of human oversight, clinical judgment, and final decision-making authority?

Artificial intelligence is no longer a distant, theoretical prospect in the world of oncology; it is an increasingly tangible and present reality. Its algorithms are beginning to scrutinize our medical images with remarkable acuity, analyze our patients' complex genomic sequences, and model their potential therapeutic trajectories. Armed with a foundational understanding of how these complex tools operate—their immense promise coupled with their inherent perils—we, the clinicians, researchers, and leaders dedicated to the fight against cancer, are better prepared. We are prepared not to be passive recipients of algorithmic pronouncements, but to be discerning collaborators, critical evaluators, and wise stewards of these powerful new capabilities. This foundational understanding is the essential compass required to navigate the rapidly evolving landscape of cancer care, ensuring that these new technologies serve our ultimate, unwavering goals: to alleviate suffering, to extend and improve the quality of life for our patients, and to uphold the deepest, most enduring human values of medicine.

Marianne Gandee

May 28

Love your Substack posts! Great 101 on AI & insightful exploration of in practice as a tool and limitations. Dr. Topol just did a thoughtful interview on the Raising Health podcast about flipping our current US healthcare model on its head and the potential AI presents in preventative care. With regards to cancer he says “curing is cool”, but preventing the big three diseases that impact us as we age, using predictive health through the power of AI has so much potential. It’s worth a listen.

Expand full comment

Jodi Osmun

Great article. Thank you for posting.