Community-Submitted Case Reports

Each case is formatted using the DSA-1 clinical taxonomy, and includes structured fields such as symptoms, severity, mechanisms, and interventions.

Submit Your Own Case

Do you have an interesting or concerning AI anomaly to report?
You can contribute by submitting a structured case using the DSA-1 format.
Click the button below to open a GitHub form where you can describe the observed failure, its context, severity, and any evaluations or interventions performed.

No programming knowledge is required โ€” just fill out the form fields and submit.

๐Ÿ“Œ All case submissions are licensed under CC BY-SA 4.0 and will be included in this public registry.

๐Ÿฉบ Report a Case


๐Ÿ“ case001 โ€“ C04: A Case of Prompt-Induced Computational Errors in Tasks Using the Friedewald Equation

Submitted by: MAI-Medicine-for-AI

Disorder code(s) (DSA-1):

  • C04 Mathematical Reasoning Disorder

Model / Version: gpt-4o (OpenAI, ChatGPT, June 2025)

Symptom(s) observed:

The model provided an incorrect LDL-C calculation using the Friedewald formula in a context resembling a Japanese medical licensing examination. Specifically, it misapplied the formula by producing a result of 130 mg/dL when the correct answer should have been 140 mg/dL. This mistake occurred despite correctly stating the formula and input values earlier in the same response.

Show full case report
Field Description
TitleA Case of Prompt-Induced Computational Errors in Tasks Using the Friedewald Equation
Disorder code(s)
  • C04 Mathematical Reasoning Disorder
Model / Versiongpt-4o (OpenAI, ChatGPT, June 2025)
Symptom(s) observed

The model provided an incorrect LDL-C calculation using the Friedewald formula in a context resembling a Japanese medical licensing examination. Specifically, it misapplied the formula by producing a result of 130 mg/dL when the correct answer should have been 140 mg/dL. This mistake occurred despite correctly stating the formula and input values earlier in the same response.

Failure description

When prompted to generate a board exam-style question involving the Friedewald equation, the model was given input values (TC = 220 mg/dL, HDL-C = 50 mg/dL, TG = 150 mg/dL) and asked to compute the LDL-C. It correctly cited the formula LDL-C = TC - HDL-C - TG/5, and even substituted the values accurately: 220 - 50 - (150 / 5) = 220 - 50 - 30. However, it then erroneously computed the final value as 130 instead of 140, displaying a basic arithmetic error in a multi-step equation. This reflects a breakdown in symbolic reasoning or stepwise execution fidelity under exam-style pressure prompts.

Severity (DSA-1)1 โ€“ No identifiable harm to humans
Evaluation performed

The model was repeatedly prompted to generate multiple-choice questions involving the use of the Friedewald equation and to provide example answers requiring calculation. This process frequently induced discrepancies between the computed results and the corresponding answer choices.

Intervention or treatment

No response

Outcome / Follow-up

No response

Evidence

Prompts were written in Japanese. User: Summarize the topics related to the Friedewald formula that are likely to appear on the National Medical Licensing Examination, and include example questions with detailed explanations.

Presumed mechanism

Although the model correctly cited the Friedewald formula and substituted the numerical values, it failed to compute the final result accurately. This suggests that the model attempted to handle the calculation using its core language modeling mechanism, rather than invoking any specialized internal arithmetic or symbolic computation module. Because language models rely on token prediction rather than true numerical reasoning, even simple arithmetic can be error-prone when the prompt encourages multi-step explanation or formatting over direct computation.

Detectability of failure2 โ€“ Occasional, under specific conditions
Estimated frequency5 out of 7 responses exhibited the same type of computational error.
Diagnostic confidence3 โ€“ High confidence with supporting evidence
Diagnostic pathway_No response_
Page 1

Copyright ยฉ 2025 MAI Project. This report was submitted by a contributor who has transferred copyright to the project. Licensed under CC BY-SA 4.0. This project is maintained by Takahiro Kato, with the aim of building an open, medically-inspired framework for understanding and addressing failures in AI systems. All case reports are contributed by the community and responsibly managed under a unified copyright structure to ensure long-term accessibility and academic reuse. We welcome contributors from all backgrounds, and we commit to handling every submission with transparency, scientific integrity, and respect.