Voice of the Customer (VoC) — AI-Powered Call Transcript Analytics Platform

N8N Gemini 2.5 Flash ArangoDB Chart.js Docker Traefik

End-to-end Voice of the Customer analytics pipeline processing real insurance call centre transcripts through LLM extraction, with structured insights stored in ArangoDB and served via a live webhook dashboard.

Proof of Concept — Built in Days

Weekend POC demonstrating production-grade LLM extraction at scale

16
Structured Fields Extracted
500
Insurance Transcripts Processed
92k+
Source Dataset Size
Live
Webhook Dashboard

Overview

Voice of the Customer (VoC) analytics platform built as a proof of concept demonstrating how LLMs can extract structured business intelligence from insurance call centre transcripts at scale. The pipeline processes real PII-redacted call transcripts, analyses them through a carefully engineered Gemini 2.5 Flash extraction prompt, stores structured insights in ArangoDB, and serves a live dashboard through an N8N webhook — all with zero external hosting dependencies.

The Challenge

Insurance call centres generate thousands of hours of customer conversations daily. Buried within these calls are critical business signals: frustrated customers about to churn, recurring billing complaints, agents who need coaching support, and policy issues that drive repeated contact. Manual review doesn't scale, and traditional keyword-based analytics miss the nuance of human conversation — sarcasm, subtle frustration, sentiment shifts during a call, and context-dependent meaning.

Architecture & Technical Design

The pipeline runs as a sequence of N8N workflow stages:

  1. Data Ingestion: JSON transcript files loaded from Google Drive, filtered for quality, split into individual records
  2. Deduplication: Each transcript checked against ArangoDB before LLM processing — prevents redundant API calls and cost waste
  3. LLM Extraction: Gemini 2.5 Flash analyses each transcript with a structured extraction prompt returning 16 fields:
    • Sentiment (positive/negative/neutral/mixed) with calibrated score (-1.0 to +1.0)
    • Sentiment journey — narrative arc of how customer mood evolved during the call
    • Sentiment shifts detection (boolean + description)
    • Topic classification against fixed taxonomy (claims, billing, policy_change, complaint, renewal, quote, roadside, cancellation, coverage_inquiry, payment, new_policy, general_inquiry)
    • Key issues in plain language (case-note quality, not generic labels)
    • Resolution status, customer satisfaction, urgency, call type
    • Agent performance assessment
    • Churn risk prediction (low/medium/high)
    • Follow-up required flag
  4. Validation & Enrichment: Code node unwraps LLM output, merges with original metadata (audio duration, ASR confidence, source), generates ArangoDB document key
  5. Storage: AQL UPSERT into ArangoDB collection via cursor API — prevents duplicates on reprocessing
  6. Dashboard: Separate webhook workflow queries ArangoDB with server-side AQL aggregation, builds HTML with Chart.js, returns complete page via Respond to Webhook node

Key Technical Decisions

  • Model Selection: Benchmarked GPT-4.1 Mini vs Gemini 2.5 Flash on identical transcripts. GPT-4.1 Mini consistently flattened subtle frustration to "neutral" (sentiment_score: 0, sentiment_shifts: false). Gemini 2.5 Flash correctly detected sentiment shifts — for example, identifying a customer who said "I don't want to waste my time" as negative (-0.5) with sentiment_shifts: true and a detailed sentiment journey. Selected Flash for production use.
  • Prompt Engineering: The extraction prompt includes calibrated sentiment scoring with explicit examples at each level (-1.0 to +1.0), rules for detecting even subtle sentiment shifts, good/bad examples for key_issues to prevent generic labels, churn risk assessment criteria, edge case handling for IVR recordings and cut-off transcripts, and PII placeholder handling.
  • Deterministic Code Nodes: All scoring, aggregation, and data transformation handled by JavaScript Code nodes — LLM only does extraction. Same philosophy used in production recruitment matching system.
  • Dashboard Architecture: Webhook-served HTML with server-side AQL aggregation — the query runs inside ArangoDB (grouping, counting, sorting) and returns a single pre-aggregated object. No external hosting, no frontend framework, no CORS issues.

Dataset

Processed the CallCenterEN dataset from Hugging Face — 92,000+ real-world PII-redacted call centre transcripts. Filtered and curated 500 high-quality insurance conversations from auto_insurance_customer_service_inbound, insurance_outbound, automotive_and_healthcare_insurance_inbound, and customer_service_general_inbound sources.

  • Average transcript: 10.6 minutes
  • Average word count: 1,166 words
  • Average ASR confidence: 0.94

Results

  • 16-field structured extraction per transcript with consistent schema compliance
  • Calibrated sentiment detection catching nuanced frustration that competing models missed
  • Deduplication preventing redundant LLM processing on rerun
  • Live dashboard served from webhook with real-time ArangoDB aggregation
  • Full pipeline built from concept to working demo in days

Project Information

Project Type

Personal Proof of Concept

Built as a weekend POC

Timeline

Mar 2026

Role

AI Engineer & Pipeline Architect

Technologies Used

N8N Gemini 2.5 Flash ArangoDB AQL Docker Traefik Chart.js Webhooks Google Drive API JavaScript Python Prompt Engineering

Skills Demonstrated

  • LLM Extraction & Prompt Engineering
  • Model Benchmarking & Evaluation
  • Pipeline Architecture (N8N)
  • Graph Database Design (ArangoDB)
  • Dashboard Development (Chart.js)
  • Self-Hosted Infrastructure

Technical Highlights

Model Benchmarking

Benchmarked GPT-4.1 Mini vs Gemini 2.5 Flash on identical transcripts. Flash selected for superior nuanced sentiment detection — correctly identifying subtle frustration that GPT-4.1 Mini flattened to neutral.

Prompt Engineering

Extraction prompt with calibrated sentiment examples at each scale level, good/bad key_issues examples, churn criteria, and edge case handling for IVR recordings and cut-off transcripts.

ArangoDB Storage

AQL UPSERT pattern prevents duplicates on pipeline rerun. Server-side aggregation queries keep dashboard logic inside the database where it belongs.

Live Webhook Dashboard

N8N Respond to Webhook node serves complete Chart.js HTML — no external hosting, no frontend framework, no CORS issues. Real-time aggregation from ArangoDB on every request.

Deduplication Pipeline

Pre-LLM check against ArangoDB prevents redundant API calls and cost waste on reprocessing runs. Deterministic document key generation from transcript metadata.

Sentiment Journey Tracking

Captures how customer mood evolves during a call — not just a single score. Tracks sentiment shifts with boolean detection and narrative description of the emotional arc.

Back to All Projects