Skip to content

Human-in-the-Loop with LangGraph Interrupts

Overview

This implementation demonstrates LangGraph's interrupt-based Human-in-the-Loop pattern - one of the most powerful features of LangGraph for building reliable AI agents that require human oversight.

How It Works

The LangGraph HITL Pattern

Unlike pre-processing approval (which waits before starting), LangGraph's interrupt pattern:

  1. Starts processing immediately - The agent begins workflow execution
  2. Pauses at critical decision points - Uses interrupt_before to halt at specific nodes
  3. Persists state - Saves workflow state in a checkpointer (MemorySaver)
  4. Awaits human input - Returns control to API, workflow is "frozen" mid-execution
  5. Resumes from checkpoint - When human approves, workflow continues from exact interruption point

Workflow Architecture

┌─────────────┐
│   Submit    │
│  Incident   │
└──────┬──────┘
┌──────────────┐
│  Classify    │  ◄─── Step 1: Classify incident
│  (LangGraph) │
└──────┬───────┘
┌────────────────┐
│ Human Review   │  ◄─── Step 2: Determine if approval needed
│  (P1/P2?)      │
└────────┬───────┘
         ├─────► Low Priority (P3/P4) → Auto-approve → Continue
         └─────► High Priority (P1/P2) → 🛑 INTERRUPT
                                   ┌──────────────────┐
                                   │   Checkpoint     │
                                   │  State Saved     │
                                   └────────┬─────────┘
                                   [Workflow Paused]
                                   [API Returns]      
                                   [Human Reviews]
                                    ┌───────▼────────┐
                                    │  Human Decides │
                                    └───────┬────────┘
                           ┌────────────────┼────────────────┐
                           │                                 │
                      ✅ Approve                        ❌ Reject
                           │                                 │
                           ▼                                 ▼
                   ┌────────────────┐             ┌─────────────────┐
                   │ Resume from    │             │  Workflow       │
                   │  Checkpoint    │             │  Terminated     │
                   └───────┬────────┘             └─────────────────┘
                   ┌────────────────┐
                   │    Spatial     │  ◄─── Step 3: Spatial analysis
                   │   Analysis     │
                   └───────┬────────┘
                   ┌────────────────┐
                   │   Guidance     │  ◄─── Step 4: Search guidance
                   │    Search      │
                   └───────┬────────┘
                   ┌────────────────┐
                   │  Notifications │  ◄─── Step 5: Send notifications
                   └───────┬────────┘
                   ┌────────────────┐
                   │   Finalize     │  ◄─── Step 6: Complete
                   └────────────────┘

Key Implementation Details

1. Checkpointer Setup

from langgraph.checkpoint.memory import MemorySaver

class HITLIncidentAgent:
    def __init__(self):
        # Create checkpointer for state persistence
        self.checkpointer = MemorySaver()
        self.workflow = self._create_workflow()

2. Interrupt Configuration

def _create_workflow(self):
    workflow = StateGraph(HITLIncidentState)

    # Add all nodes
    workflow.add_node("classify", self._classify_incident)
    workflow.add_node("human_review", self._request_human_review)
    workflow.add_node("spatial", self._check_spatial_context)
    # ... more nodes

    # Compile with interrupt BEFORE spatial analysis
    return workflow.compile(
        checkpointer=self.checkpointer,
        interrupt_before=["spatial"]  # 🛑 Pause here if approval needed
    )

3. State Management

class HITLIncidentState(TypedDict):
    # Standard fields
    incident_id: str
    description: str
    # ... more fields

    # HITL-specific fields
    human_approval_required: bool  # Set during classification
    human_approved: bool | None    # None = awaiting, True/False = decided
    human_feedback: str | None     # Feedback from human
    modified_classification: dict  # Human can modify classification

4. Workflow Invocation

def process_incident(self, incident_id, ...):
    # Create initial state
    initial_state = {...}

    # Run workflow with thread_id for checkpoint tracking
    config = {"configurable": {"thread_id": incident_id}}
    result = self.workflow.invoke(initial_state, config)

    # Check if paused at interrupt
    if result["awaiting_human"]:
        return {"awaiting_human": True, ...}
    else:
        return {"success": True, ...}

5. Resume After Approval

def continue_after_approval(self, incident_id, approved, feedback):
    config = {"configurable": {"thread_id": incident_id}}

    # Get current state from checkpoint
    current_state = self.workflow.get_state(config)

    # Update state with human decision
    current_state.values["human_approved"] = approved
    current_state.values["human_feedback"] = feedback

    # Update checkpoint
    self.workflow.update_state(config, current_state.values)

    if not approved:
        return {"success": False, "rejected": True}

    # Resume workflow from interrupt point! 🚀
    result = self.workflow.invoke(None, config)
    return result

Why This Is Powerful

1. State Persistence

  • Workflow state is saved at the exact interruption point
  • No data loss even if server restarts
  • Can review later - no time pressure

2. Clean Separation of Concerns

  • Agent logic stays pure - no approval code mixed in
  • Human approval is a separate concern
  • Easy to test and maintain

3. Flexible Decision Points

  • Can interrupt at multiple points
  • Can have conditional interrupts
  • Can modify state before resuming

4. Auditability

  • Full history of workflow execution
  • Can see exactly when and why it paused
  • Human decisions are logged in state

5. Scalability

  • Multiple incidents can be paused simultaneously
  • Each has its own checkpoint thread
  • Can prioritize which to review first

API Endpoints

Submit Incident (with HITL)

POST /api/v1/incidents/submit

# High priority (P1/P2) → Pauses at interrupt
# Low priority (P3/P4) → Auto-approves and completes

List Pending Approvals

GET /api/v1/incidents/pending-approval

# Returns incidents paused at LangGraph interrupt

Approve and Resume

POST /api/v1/incidents/{incident_id}/approve
{
  "approved_by": "Sarah Manager"
}

# Resumes LangGraph workflow from checkpoint!

Reject and Terminate

POST /api/v1/incidents/{incident_id}/reject
{
  "approved_by": "Sarah Manager",
  "reason": "Duplicate report"
}

# Terminates workflow - will not continue

Testing the Pattern

  1. Submit high-priority incident:
    curl -X POST http://localhost:8000/api/v1/incidents/submit \
      -H "Content-Type: application/json" \
      -d '{
        "incident_type": "chemical_spill",
        "location": "Industrial Park",
        "description": "Major hazardous chemical leak",
        "urgency": "critical"
      }'
    

Response: Workflow paused at interrupt, awaiting review

  1. Check dashboard: Visit http://localhost:8502 → "Approval Queue" tab

  2. Approve: Click "Approve & Process" → Workflow resumes from checkpoint!

Comparison: Pre-processing vs Interrupt-based HITL

Aspect Pre-processing LangGraph Interrupts
When starts After approval Immediately
State management Manual DB queries Automatic checkpointing
Resume capability Starts from scratch Resumes from exact point
Flexibility One gate Multiple interrupt points
Code complexity Higher (manual state) Lower (LangGraph handles it)
Agent purity Mixed with approval logic Clean separation
Auditability Manual logging Built-in state history
Scalability Limited Excellent

Advanced Patterns

Multi-level Approval

# Interrupt at multiple decision points
return workflow.compile(
    checkpointer=self.checkpointer,
    interrupt_before=["spatial", "notify"]  # Two gates
)

Conditional Interrupts

def should_interrupt(state):
    return state["severity"] == "critical" and state["impact_area"] > 10

# Add conditional routing instead of fixed interrupt

Human Modification

# Human can modify classification before resuming
current_state.values["modified_classification"] = {
    "severity": "high",  # Downgrade from critical
    "priority": "P2"
}
self.workflow.update_state(config, current_state.values)

Conclusion

This implementation showcases the true strength of LangGraph's HITL pattern: seamless interruption and resumption of complex workflows with full state persistence. This is far more powerful than simple pre-processing approval and demonstrates why LangGraph is the leading framework for building production-grade AI agents with human oversight.

The pattern is: - ✅ Production-ready - State persists across restarts - ✅ Scalable - Handle multiple paused workflows - ✅ Auditable - Full state history - ✅ Flexible - Interrupt anywhere, modify state - ✅ Clean - Separation of agent logic and approval

This is the foundation for building reliable, trustworthy AI agents in critical domains like environmental incident management.