It was mid-October in Hanover, New Hampshire, a period typically defined by the vibrant foliage of the Green Mountains and the high-stakes anticipation of the medical residency "Match" process. For Chad Markey, a 33-year-old student at Dartmouth’s Geisel School of Medicine, the season should have been a victory lap. With a background in health informatics, a string of publications in prestigious journals like The Lancet and the Journal of the American Medical Association (JAMA), and glowing recommendations, Markey was by all traditional metrics a top-tier candidate for a residency in psychiatry.
Instead, Markey found himself confined to his apartment, isolated from his peers and immersed in a frantic, self-directed investigation into the digital systems governing his future. Despite his qualifications, the interview invitations that were flooding the inboxes of his classmates were conspicuously absent from his own. In their place were outright rejections from programs where he was objectively competitive. Suspecting that his application was being discarded not by human eyes but by an automated filter, Markey began a six-month quest to reverse-engineer the artificial intelligence tools now used by nearly a third of all U.S. residency programs.
The Evolution of the Residency Crisis
To understand Markey’s suspicion, one must look at the structural shifts in medical education over the last five years. Historically, the process of matching medical students with hospital residency programs was a labor-intensive human endeavor. However, the COVID-19 pandemic catalyzed a transition to virtual interviews, which inadvertently triggered an "application inflation." Without the travel costs and time constraints of in-person visits, students began applying to dozens more programs to hedge their bets.

By 2023, the Association of American Medical Colleges (AAMC) reported a massive surge in application volume, leaving hospital HR departments overwhelmed. To manage this deluge, the AAMC partnered with Thalamus, a technology company that produces a screening tool called Cortex. During the 2024–2025 application cycle, approximately 1,500 residency programs—roughly 30 percent of the national total—utilized Cortex to process, filter, and normalize applicant data.
Cortex employs fine-tuned versions of OpenAI’s generative models to perform tasks such as "grade normalization," which attempts to create a standardized metric for students coming from schools with different grading scales. While Thalamus maintains that the tool is designed to assist rather than replace human decision-making, the opacity of these systems has created a climate of anxiety among applicants who fear they are being "screened out" by invisible algorithms.
A Hidden Flaw: The Medical Leave Dilemma
Markey’s investigation focused on a specific section of his Medical Student Performance Evaluation (MSPE). In 2021, Markey was diagnosed with ankylosing spondylitis, an aggressive autoimmune disease that affects the spine. The condition occasionally rendered him unable to walk, necessitating three separate leaves of absence totaling 22 months.
When he returned to complete his degree, his MSPE described these absences as "voluntary" and taken for "personal reasons." While technically accurate in administrative terms, Markey feared that an AI screening tool would interpret the word "voluntary" as a lack of stamina or academic struggle, rather than a successfully managed medical crisis.

"I crawled out of a black hole," Markey told investigators, referring to his recovery. "I’ve come this far, and then this is happening? It felt like my worthiness as a worker was being filtered through an automated gateway that couldn’t understand context."
Reverse-Engineering the Black Box
Leveraging his background in coding and informatics, Markey spent months attempting to simulate the logic of the Cortex system. His methodology was rigorous and multi-faceted:
- Sentiment Analysis: Using VADER, an open-source natural language processing (NLP) tool, Markey compared the language in his MSPE with more descriptive, medically accurate phrasing. The results showed that the "personal reasons" phrasing received a lower sentiment score than a direct explanation of a medical condition.
- Synthetic Dataset Testing: Markey utilized Python to create a synthetic dataset of 6,000 hypothetical residency applicants. He assigned them various grades, publication counts, and recommendation strengths. He then split the group into two cohorts: one with "personal reasons" leave language and one with "medical condition" language.
- Logistic Regression Modeling: When he ran these applicants through a model trained to select the top 12 percent of candidates, those with the medically accurate language were 66 percent more likely to be selected than those with the "voluntary" phrasing, despite identical academic credentials.
- Patent Analysis: Markey tracked down the patents for Medicratic, an AI company acquired by Thalamus. By studying the data pipelines described in the patents, he attempted to mirror the weights and preference parameters residency directors might use, such as academic performance versus professionalism.
Markey’s findings suggested that even a slight semantic shift in an official document could trigger a cascade of algorithmic rejections, effectively "ghosting" a candidate before a human ever saw their resume.
Institutional Pushback and Technical Errors
Markey was not the only one raising alarms. In early 2024, Dr. Steven Pletcher, a surgeon at the University of California San Francisco (UCSF) who researches residency selection, published a study in the journal The Laryngoscope documenting "persistent errors" in the Thalamus Cortex system. Pletcher and his colleagues found that grade-normalization charts for applicants could change from minute to minute, displaying "wildly inaccurate" data.

Thalamus CEO Jason Reminick defended the platform, stating that the company had documented inaccuracies in only 10 verified instances out of 4,000 inquiries, asserting a 99.3 percent accuracy rate. Reminick argued that much of the community’s fear stemmed from a misunderstanding of the tool’s function. "Cortex is not a decision-making tool," Reminick stated. "It does not use AI to sort, filter, exclude, score, or rank applicants."
However, reports from various hospitals told a more complicated story. Tufts Medical Center reported "significant errors" in the algorithm’s handling of MSPE data. Temple Health noted that while they used the tool, they did not find the AI-generated information "very reliable."
The Resolution: Human Intervention
Markey’s breakthrough did not come from a technical fix, but from a return to traditional networking. Realizing that his digital application might be stuck in an algorithmic loop, he began "cold-emailing" program coordinators at his top-choice hospitals to highlight his recent publication in the journal Blood.
The response was near-instantaneous. Within 75 minutes of emailing a top psychiatry program, he received an interview offer. Other prestigious institutions followed suit. To Markey, this confirmed his suspicion: his application had been sitting in a digital pile that no human had bothered to click on because it hadn’t met the initial algorithmic threshold.

In March 2024, Markey successfully matched with the psychiatry program at Columbia University’s New York-Presbyterian Hospital—one of the most competitive programs in the nation.
Broader Implications and the Future of AI Hiring
The Markey case serves as a microcosm of the "AI doom loop" currently plaguing the global labor market. As job seekers use AI to mass-produce applications, employers respond by using AI to mass-filter them, creating an arms race that often excludes qualified candidates who do not fit a specific linguistic profile.
The regulatory response remains fragmented. While states like California, Illinois, and Colorado have introduced laws requiring bias testing or applicant notification, none currently grant individuals the right to see exactly how an algorithm scored them. This lack of transparency contrasts sharply with other regulated industries; for example, the Fair Credit Reporting Act (FCRA) requires background-check companies to allow candidates to dispute and correct inaccurate data.
As AI continues to integrate into high-stakes sectors like medicine, law, and corporate hiring, the burden of proof is shifting onto the individual. Chad Markey’s success was a result of his unique ability to "speak" the language of the machine that rejected him. For the millions of other job seekers without a background in Python and health informatics, the algorithmic black box remains a formidable and often insurmountable barrier to entry.

The medical community now faces a pivotal choice: continue the trend toward automated efficiency or reinvest in the human-centric review processes that the profession of medicine is built upon. For now, Markey’s journey stands as a rare example of a human winning a battle against the "gatekeeper" code, though it highlights a system that is increasingly difficult to navigate without a map of the algorithm itself.
