+49-1577-505-4443
События

Can artificial intelligence create simulation scenarios on par with humans? 

Can artificial intelligence create simulation scenarios on par with humans? A comparative analysis of scenarios written by humans and AI, using the case of «Fire in the operating room». Article about pilot study on the possibility of the AI-authorship recognition.

A blind comparative pilot study was conducted to evaluate the quality of simulation scenarios generated by generative artificial intelligence (AI) models compared to human-generated scenarios. The objectives of the study were to test the feasibility of generating scenarios by AI models, compare their quality with human-generated scenarios, and assess the ability of experts to determine authorship. Three models (Grok, ChatGPT and DeepSeek) were used to generate scenarios based on a standardized prompt, and the ROSOMED competition case served as the reference standard. Five independent experts evaluated four scenarios using the original scale of assessment of a simulation scenario (SASS). The ChatGPT scenario (average score of 3.4) outperformed the human scenario (2.9). The AI evaluator generally ranked the scenarios similarly to the human experts, but it showed bias towards its own work when self-evaluating.

The main outcomes of the reseach:

- When provided with clear and detailed prompts, generative artificial intelligence systems are capable of creating simulated clinical scenarios that are comparable in quality to those created by humans.

- It has been demonstrated that human experts are unable to reliably distinguish between scenarios created by AI and texts written by humans.

- An AI system that evaluated a number of scenarios, the composition of which was authored by a variety of AI systems and human contributors, demonstrated bias and assigned its own work a disproportionately elevated rating.

Full-Text in English: researchgate.net/publication/400166876