Today, we’re diving deep into one such problem: insurance claims fraud.
We’ll explore why traditional machine learning often falls short and build a sophisticated hybrid AI system that combines the perceptual power of Large Language Models (LLMs) with the power and speed of symbolic reasoning.
Finally, we’ll map out a plan for deploying this system into a scalable, enterprise-grade production environment on AWS.
This is a long, practical guide, so grab a coffee and let’s get started.
The Problem – A Multi-Billion Dollar Headache
Insurance fraud isn’t a victimless crime. The FBI estimates that non-health insurance fraud costs the industry more than $40 billion per year, which translates to an average of $400 to $700 in increased annual premiums for the average U.S. family.
This fraud can be “hard” (e.g., staging a fake accident) or “soft” (e.g., exaggerating the damages from a real accident). A common example of soft fraud is adding a pre-existing mechanical issue, like a faulty transmission, to a claim for a minor fender-bender.
Why is this such a hard problem to solve with AI?
- The Data is Messy and Multimodal: A single claim is a complex entity. It includes structured data (policy details, claim amounts), semi-structured data (mechanic’s reports), and, most challengingly, unstructured data like phone call transcripts and email exchanges between the claimant and the adjuster.
- Subtlety and Nuance: Fraudulent behavior isn’t always a smoking gun. It’s often hidden in subtle linguistic cues: hesitation when asked a key question, convenient rationalizations for inconsistencies, or undue pressure for a quick payout.
- The “Black Box” Dilemma: A standard deep learning model might be trained to output a “fraud score” of 87%, but it can’t answer the crucial follow-up question: “Why?” In a regulated industry like insurance, you can’t deny a claim or flag a customer for review without a clear, auditable reason. A black box model is a non-starter for regulators and for building trust with customers.
This is where our journey begins. We need a system that can perceive nuance in human language and provide a crystal-clear, logical explanation for its decisions.
The Solution – A Hybrid Intelligence System
To solve this, we will architect a system that leverages the best of both worlds:
- A Large Language Model (LLM): To act as our “forensic linguistic analyst.” Its job is to listen to the unstructured conversation and translate the subtle, behavioral cues into structured facts.
- A Symbolic Reasoner (Clingo): To act as our “claims adjudication engine.” It takes the facts from the LLM—along with all the other claim data—and applies a set of explicit, auditable business rules to reach a decision.
This is a Neuro-Symbolic AI approach. The neural network (the LLM) handles perception, and the symbolic system handles reasoning.
Step 1: Perception with an LLM – Structuring the Unstructured
First, we need to extract meaningful signals from a call transcript. We can’t just ask an LLM, “Is this fraud?” That would give us another black box answer. Instead, we give it a highly specific role and a strict output format.
We’ll use a prompt engineering technique called “Role-Based Constrained Output.” We tell the LLM it is an expert analyst whose only job is to identify and list predefined indicators.
Install the necessary dependencies:
!pip install -q clingo langchain-google-vertexai langchainHere is the transcript we’ll analyze:
call_transcript = '''
Adjuster: Sarah Jenkins
Claimant: David Smith
(Call begins)
Adjuster: "Good morning, am I speaking with Mr. David Smith?"
Claimant: "Yes, this is David."
Adjuster: "Mr. Smith, this is Sarah Jenkins from InsuSure, calling regarding your recent auto claim, reference number clm789. I just need to confirm a few details with you. This call is being recorded for quality and training purposes, is that okay?"
Claimant: "Yeah, that's fine. About time you called."
Adjuster: "Thank you for your patience. My records show the incident was on October 26th. Can you please confirm for me the time the collision occurred?"
Claimant: "Umm, let me think... It was late, definitely dark out. The roads were pretty empty. I’d say it must have been around 9:30 PM, maybe a bit later."
Adjuster: "Okay, thank you. I'm looking at the attendant police report now, and it notes the time of the incident as approximately 7:30 PM."
Claimant: "Oh. Right, 7:30. Yes. It gets dark so early these days, I must have gotten mixed up. To be honest, I was pretty shaken up right after it happened, you know how it is. My mind was all over the place."
Adjuster: "I understand. Can you tell me where you were coming from at the time?"
Claimant: "I was just on my way back from the grocery store, picking up a few things for dinner."
Adjuster: "Alright. Now, let's talk about the damage. I see the estimate from the repair shop. It lists the front bumper and windshield, which seems consistent. But it also includes a full 'engine tune-up'. Can you tell me why that was needed?"
Claimant: "The engine? Oh yeah, it started making a really weird rattling noise right after the impact. The mechanic, a really knowledgeable guy, said it was probably related to the shock of the crash and that it was best to get it all checked out at once. Seemed smart to me."
Adjuster: "I see. And what was the general condition of the vehicle's body and engine before this incident?"
Claimant: "It's a 2018 model, it's been very reliable for me. A great car."
Adjuster: "Understood. Just a final clarifying question, where were you heading to at the time?"
Claimant: "Like I said, just getting home. Look, the main thing is that I'm a contractor and I'm losing money every day this car is off the road. I'd be really grateful if we could get this payment sorted out quickly."
(Call ends)
'''And here is the heart of our prompt, which defines the fact categories we want the LLM to find:
extraction_prompt = '''
### ROLE & MISSION
You are an expert Forensic Linguistic Analyst AI. Your mission is to perform a systematic and unbiased analysis of a provided phone call transcript between an insurance claimant and a claims adjuster.
Your goal is to identify and extract specific, predefined behavioral and linguistic indicators relevant to claims verification and fraud analysis. You are an observation engine, not a judge.
### CORE TASK
Analyze the provided transcript for the given `Claim ID`. Your SOLE output will be a list of structured facts in the Clingo Answer Set Programming (ASP) format.
Each fact must correspond to one of the specific categories defined below.
Do not output any preamble, explanation, or summary. Your entire response must be a valid list of Clingo facts.
### CRITICAL: OUTPUT FORMAT
- Every line of your output must be a valid Clingo fact.
- The format is: `predicate(claim_id, "detail", "context").`
- All string arguments MUST be enclosed in double quotes (`"`).
- The `claim_id` must be the one provided in the input.
- If no indicators are found, output nothing.
---
### FACT CATEGORIES TO EXTRACT
**1. Contradiction (`contradiction/3`)**
- **Definition:** The claimant makes a statement that directly contradicts a known fact provided in the transcript (e.g., a police report, a mechanic's assessment).
- **Format:** `contradiction(ClaimID, "DetailContradicted", "SourceOfContradiction").`
- **Example:** `contradiction(clm789, "incident_time", "police_report").`
**2. Self-Contradiction (`self_contradiction/3`)**
- **Definition:** The claimant makes a statement that contradicts something they said earlier *within the same transcript*.
- **Format:** `self_contradiction(ClaimID, "TopicOfContradiction", "OriginalStatement -> NewStatement").`
- **Example:** `self_contradiction(clm789, "number_of_passengers", "one -> two").`
**3. Hesitation or Uncertainty (`hesitation/2`)**
- **Definition:** The claimant shows unusual hesitation, uses filler words (um, uh, er), or expresses uncertainty when asked about a key detail of the incident.
- **Format:** `hesitation(ClaimID, "TopicOfHesitation").`
- **Example:** `hesitation(clm789, "sequence_of_events").`
**4. Evasion (`evasion/2`)**
- **Definition:** The claimant avoids answering a direct question and instead changes the subject or answers a different question.
- **Format:** `evasion(ClaimID, "QuestionEvaded").`
* **Example:** `evasion(clm789, "query_about_prior_vehicle_damage").`
**5. Rationalization (`rationalization/3`)**
- **Definition:** The claimant provides a seemingly logical but potentially convenient explanation to excuse a contradiction or inconsistency.
- **Format:** `rationalization(ClaimID, "Inconsistency", "JustificationProvided").`
- **Example:** `rationalization(clm789, "incident_time_discrepancy", "was_shaken_up").`
**6. Expressed Urgency or Pressure (`expressed_urgency/2`)**
- **Definition:** The claimant emphasizes the need for a quick payout, often citing financial hardship or job-related pressure. This is a tactic to rush the process.
- **Format:** `expressed_urgency(ClaimID, "ReasonForUrgency").`
- **Example:** `expressed_urgency(clm789, "needs_car_for_job").`
**7. Unsolicited Information (`provides_unsolicited_info/2`)**
- **Definition:** The claimant offers specific details or justifications that were not asked for, which can sometimes be a sign of a rehearsed story.
- **Format:** `provides_unsolicited_info(ClaimID, "TopicOfInformation").`
- **Example:** `provides_unsolicited_info(clm789, "alibi_for_location_before_incident").`
**8. Suspicious Confirmation (`confirms_unusual_detail/2`)**
- **Definition:** The claimant agrees with or confirms a detail that is flagged as unusual or inconsistent with the incident type (e.g., confirming a mechanical repair for a simple cosmetic claim).
- **Format:** `confirms_unusual_detail(ClaimID, "UnusualDetailConfirmed").`
- **Example:** `confirms_unusual_detail(clm789, "engine_tuneup_for_collision").`
---
### RULES OF ENGAGEMENT
1. **OBSERVE, DO NOT CONCLUDE:** Your task is to identify observable linguistic events from the list above. You must NOT make a final judgment on fraud.
2. **LITERAL EXTRACTION:** Base your facts ONLY on the text provided in the transcript. Do not infer details that are not present.
3. **CLAIMANT FOCUS:** Analyze the claimant's speech only. Do not generate facts about the adjuster's speech.
4. **STRICT FORMATTING:** Adhere strictly to the Clingo fact format. Any deviation will cause system failure.
---
### EXAMPLE
**Input Data:**
{call_transcript}
'''We then use an LLMi to get our structured facts.
import os
import sys
from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage, SystemMessage
GOOGLE_PROJECT_ID = <your google cloud project id>
REGION = "us-central1"
# --- LLM Initialization ---
def initialize_llm(model_name="gemini-2.5-pro", project_id=None, location=None):
"""Initializes and returns the ChatVertexAI model.
You can use any LLM here."""
if "google.colab" in sys.modules:
from google.colab import auth
auth.authenticate_user()
if not project_id or not location:
raise ValueError("GOOGLE_PROJECT_ID and REGION must be set.")
return ChatVertexAI(
model=model_name,
project=project_id,
location=location,
temperature=0,
max_tokens=65535,
max_retries=6,
stop=None,
).bind_tools([{"google_search": {}}])
llm = initialize_llm(project_id=GOOGLE_PROJECT_ID, location=REGION)
messages = [SystemMessage(content="You are an expert Forensic Linguistic Analyst AI. Your mission is to perform a systematic and unbiased analysis of a provided phone call transcript between an insurance claimant and a claims adjuster.")]
messages.append(HumanMessage(content=extraction_prompt.format(call_transcript=call_transcript)))
try:
response = llm.invoke(messages)
llm_answer = response.content
print(llm_answer)
except Exception as e:
print(f"An error occurred while communicating with the LLM: {e}")
The LLM processes the transcript and returns a beautiful, clean list of facts, exactly as we instructed:
contradiction(clm789, "incident_time", "police_report").
self_contradiction(clm789, "incident_time", "9:30 PM -> 7:30 PM").
hesitation(clm789, "incident_time").
evasion(clm789, "query_about_prior_vehicle_condition").
rationalization(clm789, "incident_time_discrepancy", "was_shaken_up_and_confused_about_darkness").
rationalization(clm789, "unusual_repair_item_engine_tune-up", "mechanic_said_it_was_related_to_shock_of_crash").
expressed_urgency(clm789, "losing_money_as_a_contractor").
provides_unsolicited_info(clm789, "praise_for_mechanics_knowledge").
confirms_unusual_detail(clm789, "engine_tuneup_for_collision_damage").Notice how the LLM perfectly identified the hesitation (“Umm…”), the contradiction with the police report, the evasion about the car’s prior condition, and the pressure tactic at the end. This is raw material our symbolic reasoner can work with.
Step 2: Reasoning with Clingo – Applying the Rulebook
Now that we have structured facts, we can use a powerful tool called Clingo, which is a solver for Answer Set Programming (ASP). Think of ASP as “SQL for logic.” You define a set of facts and rules, and the solver finds all possible “answer sets” (solutions) that satisfy those rules.
We create a program that includes:
- The LLM’s facts about the conversation.
- Hard facts about the claim (policy data, mechanic’s report, etc.).
- Business rules that define what constitutes a
red_flagand how to make a finaldecision.
Here’s the core of our Clingo program:
insurance_fraud_program = f"""
% --- Facts: The Data for a Specific Claim ---
% --- LLM-Derived Facts (as if generated from a transcript) ---
{llm_answer}
% --- Original Facts (Policy, Mechanic, etc.) ---
policy(p101).
customer(c123, p101).
policy_active(p101, "2024-10-26").
covers(p101, collision).
claim(clm789, p101).
incident_date(clm789, "2024-10-26").
incident_type(clm789, collision).
mechanic_report(mr555, clm789).
repair_estimate(mr555, engine_tuneup, 250).
repair_shop(mr555, "shady_repairs_inc").
suspicious_shop("shady_repairs_inc").
inconsistent_damage(engine_tuneup, collision).
% --- Rules: The Logic for Adjudication ---
is_covered(Claim) :- claim(Claim, Policy), incident_date(Claim, Date), policy_active(Policy, Date), incident_type(Claim, Type), covers(Policy, Type).
{{ decision(Claim, approve); decision(Claim, flag); decision(Claim, deny) }} = 1 :- is_covered(Claim).
:- decision(Claim, approve), red_flag(Claim, _).
decision(Claim, flag) :- red_flag(Claim, _).
decision(Claim, deny) :- claim(Claim, _), not is_covered(Claim).
% --- Red Flag Definitions ---
red_flag(Claim, suspicious_repair_shop) :- mechanic_report(Report, Claim), repair_shop(Report, Shop), suspicious_shop(Shop).
red_flag(Claim, inconsistent_repair(Part)) :- mechanic_report(Report, Claim), repair_estimate(Report, Part, _), incident_type(Claim, Type), inconsistent_damage(Part, Type).
red_flag(Claim, transcript_contradiction(Detail, Source)) :- contradiction(Claim, Detail, Source).
red_flag(Claim, unusual_repair_justification(Part)) :- confirms_unusual_repair(Claim, Part, Type), inconsistent_damage(Part, Type).
red_flag(Claim, pressure_tactic_combined_with_issue) :- expressed_urgency(Claim, _), red_flag(Claim, Flag), Flag != pressure_tactic_combined_with_issue.
% --- Output ---
#show decision/2.
#show red_flag/2.
"""This declarative style is incredibly powerful. We don’t tell the program how to find the flags; we just describe what a flag is.
Now, we run the solver:
import clingo
# This program includes the base facts, rules, and the LLM-derived facts
insurance_fraud_program = f"""
% --- Facts: The Data for a Specific Claim ---
% --- LLM-Derived Facts (as if generated from a transcript) ---
{llm_answer}
% --- Original Facts (Policy, Mechanic, etc.) ---
policy(p101).
customer(c123, p101).
policy_active(p101, "2024-10-26").
covers(p101, collision).
claim(clm789, p101).
incident_date(clm789, "2024-10-26").
incident_type(clm789, collision).
mechanic_report(mr555, clm789).
repair_estimate(mr555, engine_tuneup, 250).
repair_shop(mr555, "shady_repairs_inc").
suspicious_shop("shady_repairs_inc").
inconsistent_damage(engine_tuneup, collision).
% --- Rules: The Logic for Adjudication ---
is_covered(Claim) :- claim(Claim, Policy), incident_date(Claim, Date), policy_active(Policy, Date), incident_type(Claim, Type), covers(Policy, Type).
{{ decision(Claim, approve); decision(Claim, flag); decision(Claim, deny) }} = 1 :- is_covered(Claim).
:- decision(Claim, approve), red_flag(Claim, _).
decision(Claim, flag) :- red_flag(Claim, _).
decision(Claim, deny) :- claim(Claim, _), not is_covered(Claim).
% --- Red Flag Definitions ---
red_flag(Claim, suspicious_repair_shop) :- mechanic_report(Report, Claim), repair_shop(Report, Shop), suspicious_shop(Shop).
red_flag(Claim, inconsistent_repair(Part)) :- mechanic_report(Report, Claim), repair_estimate(Report, Part, _), incident_type(Claim, Type), inconsistent_damage(Part, Type).
red_flag(Claim, transcript_contradiction(Detail, Source)) :- contradiction(Claim, Detail, Source).
red_flag(Claim, unusual_repair_justification(Part)) :- confirms_unusual_repair(Claim, Part, Type), inconsistent_damage(Part, Type).
red_flag(Claim, pressure_tactic_combined_with_issue) :- expressed_urgency(Claim, _), red_flag(Claim, Flag), Flag != pressure_tactic_combined_with_issue.
% --- Output ---
#show decision/2.
#show red_flag/2.
"""
# Setup and run the solver
ctl = clingo.Control(["0"]) # "0" to find all models
ctl.add("base", [], insurance_fraud_program)
ctl.ground([("base", [])])
print("Running Insurance Claim Adjudication and Fraud Analysis...")
def on_model(model):
# Sort the atoms for clear, consistent output
sorted_atoms = sorted(model.symbols(shown=True), key=str)
decision = [atom for atom in sorted_atoms if atom.name == "decision"]
red_flags = [atom for atom in sorted_atoms if atom.name == "red_flag"]
print("\n--- Solver Result ---")
if decision:
print(f"Final Decision: {decision[0]}")
else:
print("No decision could be reached.")
if red_flags:
print("\nReasons (Red Flags):")
for flag in red_flags:
print(f"- {flag}")
print("---------------------\n")
ctl.solve(on_model=on_model)The output is exactly what a business needs: a clear decision backed by auditable reasons.
Running Insurance Claim Adjudication and Fraud Analysis...
--- Solver Result ---
Final Decision: decision(clm789,flag)
Reasons (Red Flags):
- red_flag(clm789,inconsistent_repair(engine_tuneup))
- red_flag(clm789,pressure_tactic_combined_with_issue)
- red_flag(clm789,suspicious_repair_shop)
- red_flag(clm789,transcript_contradiction("incident_time","police_report"))
---------------------
SolveResult(5)This is true Explainable AI (XAI). The claim was flagged not because of a vague “fraud score,” but because of four specific, documented reasons. A human adjuster can now review this file with a complete understanding of the system’s reasoning.
From PoC to Production – An MLOps Blueprint
Our Colab notebook proves the concept, but running this at scale requires a robust MLOps architecture. A single script is fragile; a production system must be scalable, secure, and maintainable.
Now, we will build a production-grade batch-inference service to flag potential fraud in a given set of policies.
Here is a blueprint for deploying this solution on AWS, designed for high-throughput batch processing and easy maintenance.
Insurance Fraud Detection System
Two-System Architecture: Batch Processing & Rule Management
System A
Batch Inference Pipeline
Body: { “manifest_uri”: “s3://…” }
From: s3://rules/production/
SNS Topic: fraud-detection-complete
→ Case Management System
System B
Rule Management System
POST /rules
PUT /rules/{id}
SK: version
Status: ACTIVE
active_rules.lp
System Integration Points
S3 Buckets
Input: Manifest files trigger System A processing
Rules: Active rules consumed by SageMaker containers
Output: Processing results for downstream systems
APIs
SageMaker Endpoint: Async inference API for batch processing
Rule Management API: CRUD operations via API Gateway
SNS Topic: Event-driven notifications
Decoupling Benefits
Independent Scaling: Rules update without redeployment
Business Agility: Analysts iterate on logic safely
Versioning: Audit trail and rollback capabilities
End-to-End Data Flow
S3 Upload
Trigger
Processing
S3 + SNS
Storage
Architecture Highlights
The Right Tool for the Job: SageMaker Asynchronous Inference
Our workload involves processing potentially thousands of claims in batches. The processing time for each claim includes an LLM call, making it last several seconds. This is a classic long-running, large-payload batch job.
A standard SageMaker Real-Time Endpoint, with its 60-second timeout and small payload limits, is the wrong tool. It would lead to a brittle, expensive system plagued by timeouts.
The correct choice is a SageMaker Asynchronous Inference Endpoint. Here’s why it’s a perfect fit:
- Handles Large Payloads: You can submit a manifest file pointing to thousands of transcripts in S3 (up to 1GB).
- Long Processing Times: It allows for up to 15 minutes per invocation, easily accommodating large batches.
- Cost-Effective: It can scale its instances down to zero when not in use, so you don’t pay for idle compute. This is a massive cost saving for intermittent batch workloads.
- Built-in Reliability: It uses SQS queues internally to manage requests, so you don’t lose jobs during traffic spikes.
The Architecture: Service-Oriented and Decoupled
We will build two distinct systems that communicate through well-defined interfaces (S3 and APIs).
System A: The Batch Inference Pipeline
This is our core processing engine.
- Trigger: The process starts when an upstream system drops a manifest JSON file (listing claim IDs and S3 paths to their transcripts) into a designated S3 bucket.
- Invocation: This S3
PutObjectevent triggers a Lambda function that makes a single API call to our SageMaker Asynchronous Endpoint, passing the S3 path of the manifest file. - SageMaker Endpoint: The endpoint spins up a container to run our application. This container runs the exact same logic as our Colab notebook, but in a hardened environment. It reads the manifest, iterates through each claim, calls the LLM, runs the Clingo solver, and compiles all the results.
- Output & Notification: Once the batch is complete, the container writes a single results file to an output S3 bucket and SageMaker sends a notification to an SNS topic.
- Downstream Processing: A final Lambda, subscribed to the SNS topic, processes the results—loading them into a database, a data warehouse (like Snowflake or Redshift), or creating tasks for human adjusters in a case management system.
System B: The Rule Management System
The business rules are the brains of our operation. They can’t be hardcoded. Business analysts and fraud experts need a way to update them safely.
- Web Interface: We build a simple web app (using React, hosted on S3/CloudFront) that serves as a “Rule IDE.”
- Backend API: The frontend talks to a serverless backend built with API Gateway and Lambda.
- Rule Store (DynamoDB): A DynamoDB table stores every rule, its version, its status (
DRAFT,TESTING,ACTIVE,ARCHIVED), and an audit history. - The “CI/CD for Rules” Pipeline: This is the most critical MLOps component. When a user wants to publish new rules to production, they trigger a Step Functions workflow that:
- Validates: Runs the new rules against a “golden dataset” of test cases to check for logical contradictions or unintended consequences.
- Compiles: Gathers all rules marked as
ACTIVEfrom DynamoDB. - Concatenates: Merges them into a single
active_rules.lpfile. - Deploys: Uploads this file to a specific, versioned S3 location (e.g.,
s3://insurance-rules/production/active_rules.lp).
- This S3 location is the single source of truth that our SageMaker application container loads from when it starts. This decouples rule updates from code deployments, allowing the business to iterate on logic with agility and safety.
Final Thoughts: Why This Approach is the Future
This hybrid, Neuro-Symbolic architecture represents a step forward from end-to-end deep learning models for problems like fraud detection.
- It’s Explainable: The system provides clear, human-readable reasons for its decisions, satisfying regulators and empowering human experts.
- It’s Agile: The Rule Management System allows the business logic to evolve independently of the ML infrastructure. A new fraud pattern emerges? An analyst can write, test, and deploy a new rule in hours, not weeks.
- It’s Robust: By using the right tools (SageMaker Async) and a decoupled architecture, we build a system that is scalable, cost-effective, and resilient.
This hybrid approach is a powerful pattern to have in your toolkit for tackling the next generation of complex AI challenges.
I hope you’ve enjoyed reading this guide, and let me know if you were able to use the solution and what your results are.

