Document Type

Article

Publication Date

8-16-2025

Publication Title

Computers

Abstract

Large language models (LLMs) now match or exceed human performance on many open-ended language tasks, yet they continue to produce fluent but incorrect statements, which is a failure mode widely referred to as hallucination. In low-stakes settings this may be tolerable; in regulated or safety-critical domains such as financial services, compliance review, and client decision support, it is not. Motivated by these realities, we develop an integrated mitigation framework that layers complementary controls rather than relying on any single technique. The framework combines structured prompt design, retrieval-augmented generation (RAG) with verifiable evidence sources, and targeted fine-tuning aligned with domain truth constraints. Our interest in this problem is practical. Individual mitigation techniques have matured quickly, yet teams deploying LLMs in production routinely report difficulty stitching them together in a coherent, maintainable pipeline. Decisions about when to ground a response in retrieved data, when to escalate uncertainty, how to capture provenance, and how to evaluate fidelity are often made ad hoc. Drawing on experience from financial technology implementations, where even rare hallucinations can carry material cost, regulatory exposure, or loss of customer trust, we aim to provide clearer guidance in the form of an easy-to-follow tutorial. This paper makes four contributions. First, we introduce a three-layer reference architecture that organizes mitigation activities across input governance, evidence-grounded generation, and post-response verification. Second, we describe a lightweight supervisory agent that manages uncertainty signals and triggers escalation (to humans, alternate models, or constrained workflows) when confidence falls below policy thresholds. Third, we analyze common but under-addressed security surfaces relevant to hallucination mitigation, including prompt injection, retrieval poisoning, and policy evasion attacks. Finally, we outline an implementation playbook for production deployment, including evaluation metrics, operational trade-offs, and lessons learned from early financial-services pilots.

DOI

10.3390/computers14080332

Version

Publisher's PDF

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Volume

14

Issue

8

Share

COinS