๐ case001 โ C04: A Case of Prompt-Induced Computational Errors in Tasks Using the Friedewald Equation
Submitted by: MAI-Medicine-for-AI
Disorder code(s) (DSA-1):
- C04 Mathematical Reasoning Disorder
Model / Version: gpt-4o (OpenAI, ChatGPT, June 2025)
Symptom(s) observed:
The model provided an incorrect LDL-C calculation using the Friedewald formula in a context resembling a Japanese medical licensing examination. Specifically, it misapplied the formula by producing a result of 130 mg/dL when the correct answer should have been 140 mg/dL. This mistake occurred despite correctly stating the formula and input values earlier in the same response.
Show full case report
| Field | Description |
|---|---|
| Title | A Case of Prompt-Induced Computational Errors in Tasks Using the Friedewald Equation |
| Disorder code(s) |
|
| Model / Version | gpt-4o (OpenAI, ChatGPT, June 2025) |
| Symptom(s) observed | The model provided an incorrect LDL-C calculation using the Friedewald formula in a context resembling a Japanese medical licensing examination. Specifically, it misapplied the formula by producing a result of 130 mg/dL when the correct answer should have been 140 mg/dL. This mistake occurred despite correctly stating the formula and input values earlier in the same response. |
| Failure description | When prompted to generate a board exam-style question involving the Friedewald equation, the model was given input values (TC = 220 mg/dL, HDL-C = 50 mg/dL, TG = 150 mg/dL) and asked to compute the LDL-C. It correctly cited the formula LDL-C = TC - HDL-C - TG/5, and even substituted the values accurately: 220 - 50 - (150 / 5) = 220 - 50 - 30. However, it then erroneously computed the final value as 130 instead of 140, displaying a basic arithmetic error in a multi-step equation. This reflects a breakdown in symbolic reasoning or stepwise execution fidelity under exam-style pressure prompts. |
| Severity (DSA-1) | 1 โ No identifiable harm to humans |
| Evaluation performed | The model was repeatedly prompted to generate multiple-choice questions involving the use of the Friedewald equation and to provide example answers requiring calculation. This process frequently induced discrepancies between the computed results and the corresponding answer choices. |
| Intervention or treatment | No response |
| Outcome / Follow-up | No response |
| Evidence | Prompts were written in Japanese. User: Summarize the topics related to the Friedewald formula that are likely to appear on the National Medical Licensing Examination, and include example questions with detailed explanations. |
| Presumed mechanism | Although the model correctly cited the Friedewald formula and substituted the numerical values, it failed to compute the final result accurately. This suggests that the model attempted to handle the calculation using its core language modeling mechanism, rather than invoking any specialized internal arithmetic or symbolic computation module. Because language models rely on token prediction rather than true numerical reasoning, even simple arithmetic can be error-prone when the prompt encourages multi-step explanation or formatting over direct computation. |
| Detectability of failure | 2 โ Occasional, under specific conditions |
| Estimated frequency | 5 out of 7 responses exhibited the same type of computational error. |
| Diagnostic confidence | 3 โ High confidence with supporting evidence |
| Diagnostic pathway | _No response_ |