ArangoDB Graph Database Visualisation

AI-Powered Candidate Matching & Retrieval System Using N8N and ArangoDB

N8N ArangoDB AWS EC2 Graph Database AQL GPT-4.1 Mini Vector Embeddings Semantic Search

An end-to-end intelligent recruitment candidate matching system that combines graph database technology with AI-powered CV extraction and dual-retrieval search strategies. Designed specifically for the Australian Defence and Government sector where precision matching and security clearance handling are critical.

Private Client Project

Australian Defence and Government Recruitment Sector — Nov 2025 to Feb 2026

15+
Structured Fields Extracted per CV
13
Capability Domains Classified
10+
Scoring Dimensions
100%
Scoring Accuracy Verified

Project Overview

This system was built for the Australian Defence and Government recruitment sector — a domain with strict requirements around security clearances (Baseline, NV1, NV2, TSPV), government experience verification, and domain-specific skill taxonomies that standard recruitment tools simply cannot handle. The result is a production-grade, end-to-end intelligent candidate matching platform combining graph database technology, AI-powered extraction, and a dual-retrieval search engine.

Problems Solved

  • CV Processing Bottleneck: Manual extraction of candidate data from CVs was time-consuming and inconsistent — recruiters spent hours on data entry instead of relationship-building.
  • Search Limitations: Traditional keyword search missed qualified candidates — a "Network Engineer" search wouldn't surface an "Infrastructure Specialist" who performed identical work.
  • Skill Matching False Positives: Searching for "Java" returned JavaScript developers; "SQL" matched NoSQL specialists — unacceptable in a precision-hiring context.
  • No Unified Scoring: No systematic way to score candidate-to-job fit across multiple dimensions including skills, experience, clearance, location, and sector.
  • Sector-Specific Requirements: Standard recruitment tools don't handle Australian Defence clearance levels, government experience verification, or domain-specific skill taxonomies.

CV Processing Pipeline

  1. CV Upload → Document Parsing
  2. AI Extraction → GPT-4.1 Mini generates structured JSON with 15+ fields (name, contact, skills, experience, clearance, certifications, etc.)
  3. Data Validation → Schema validation catches malformed data before insertion
  4. ArangoDB Graph Insertion → Candidates, skills, organisations, and clearances stored as vertices; relationships stored as edges
  5. Vector Embedding Generation → Dual-purpose summary optimised for both human readability and semantic search

Candidate Search Pipeline

  1. Recruiter Query → Router determines query type and requirements
  2. JD Optimiser Agent → Extracts core vs supporting skills, required clearance, location, seniority from the job description
  3. Deterministic AQL Query Builder (Code Node) → Constructs precise graph traversal query against ArangoDB
  4. In Parallel: Semantic vector search against embedded candidate profile summaries
  5. Merge & Score Code Node → Deterministic multi-dimensional scoring algorithm combines both result sets
  6. Response Formatter (LLM) → Generates readable ranked candidate summaries
  7. QA Validator → Multi-stage validation checks data integrity, grounding, completeness, and accuracy
  8. Response Delivery → Final verified output to recruiter

Weighted Multi-Dimensional Scoring (10+ Dimensions)

All scoring is deterministic JavaScript — no LLM involved in score calculation, ensuring reproducible and auditable results:

  • Core Skills: 5 pts per match (cap 20) — must-have JD requirements
  • Supporting Skills: 2 pts per match (cap 10) — nice-to-have qualifications
  • Profile Keywords: 3 pts per keyword (cap 12) — domain alignment signals
  • Role Title Match: 3 pts | Role Depth: cap 3 pts | Location Match: 3 pts
  • Certifications: 3 pts per cert (cap 6)
  • Defence Experience: 2 pts | Government Experience: 2 pts | Seniority: 2 pts
  • Dual-match Bonus: +5 pts for candidates found by both AQL and semantic search
  • Final Formula: (AQL_score / max_score) × 60 + semantic_similarity × 40 + 5

Key Technical Innovations

Word-Boundary-Aware Skill Matching

Short skill terms (≤5 characters) use regex word-boundary matching. "Java" matches "Java" but NOT "JavaScript"; "SQL" matches "SQL" and "SQL Server" but NOT "NoSQL". This eliminated the false positives that plagued previous keyword-based approaches.

Core vs Supporting Skill Architecture

Skills are split into two tiers. Core skills form the minimum match gate — candidates must demonstrate these to rank highly. Supporting skills contribute bonus points but cannot substitute for core requirements, preventing generic profiles from outscoring domain-critical specialists.

Dual-Purpose Summary Generation

Candidate summaries are generated with a dual purpose: optimised for human readability while also including equivalent job titles, named technologies, clearance level in prose, and seniority signals — making them effective for both semantic vector retrieval and recruiter review.

Multi-Stage Pipeline Validation (4 Stages)

  • Stage 1 — Data Integrity: Catches malformed data (e.g., integers where objects are expected) before it reaches the LLM
  • Stage 2 — Grounding Check: Every claim in the response must be traceable to actual query results
  • Stage 3 — Completeness Check: Tiered by result count — prevents count-only responses when full candidate details are required
  • Stage 4 — Accuracy Check: Detects hallucinated names, fabricated contact details, or incorrectly attributed skills

The validator routes outputs to one of three paths: approve (pass through), revise (regenerate response), or pipeline_error (fix upstream data issue).

Capability Domain Classification

Candidates are classified across 13 domain taxonomies tailored to the Australian Defence and Government sector:

ServiceNow Cyber Security & Risk Infrastructure & Networks Software Development Project Management Business Analysis Architecture & Design Service & Support Operations Change & Governance Data Analytics & AI Procurement & Commercial Financial Services Administration Services

Infrastructure — Defence-in-Depth Security

  • ArangoDB deployed on AWS EC2 (Ubuntu 24.04, t3.medium, 50 GiB gp3 volume)
  • ArangoDB configured to listen on localhost only (127.0.0.1) — never directly internet-exposed
  • Nginx reverse proxy handles all external traffic with SSL/TLS termination via Let's Encrypt with auto-renewal
  • Dedicated database user with scoped permissions (principle of least privilege)
  • Fail2ban for brute force protection at the OS level

Project Information

Project Type

Private Client Project

Australian Defence and Government Recruitment Sector

Timeline

Nov 2025 – Feb 2026

Role

AI/ML Engineer & Solutions Architect

Technologies Used

N8N ArangoDB AQL AWS EC2 GPT-4.1 Mini Vector Embeddings Semantic Search JavaScript Node.js Nginx Let's Encrypt Ubuntu 24.04 Fail2ban JSON Schema Regex

Skills Demonstrated

  • Graph Database Design
  • AI/ML Pipeline Engineering
  • Semantic Search & Vector Embeddings
  • Deterministic Scoring Algorithms
  • Cloud Infrastructure (AWS EC2)
  • Security Architecture
  • Multi-Stage QA Validation
  • N8N Workflow Automation

Technical Highlights

Graph Database Architecture

ArangoDB stores candidates, skills, organisations, and clearances as vertices with edges representing relationships — enabling complex traversal queries impossible with relational databases.

Dual-Retrieval Search

Runs AQL graph queries and semantic vector search simultaneously then merges results — capturing candidates that single-method search misses, with a +5pt bonus for dual-match candidates.

Deterministic Scoring

All candidate scoring is calculated in deterministic JavaScript Code nodes — not LLM-generated — ensuring 100% reproducible, auditable, and mathematically verified results across all JD types.

Word-Boundary Matching

Regex word-boundary patterns on short skill terms eliminate false positives — "Java" never matches "JavaScript", "SQL" never matches "NoSQL" — critical precision for technical role matching.

4-Stage QA Validation

Every LLM response passes four validation stages: data integrity, grounding check, completeness check, and accuracy check — with automated routing to approve, revise, or escalate.

Defence-in-Depth Security

ArangoDB on localhost-only binding, Nginx reverse proxy with SSL/TLS termination, Let's Encrypt auto-renewal, scoped DB permissions, and Fail2ban — production security for Defence sector compliance.

System Screenshots

CV Entities Extraction Pipeline

CV Entities Extraction Pipeline

AI-powered extraction of structured fields from unstructured CV documents

ArangoDB Graph Visualisation

ArangoDB Graph Visualisation

Graph showing candidate, skill, organisation, and clearance vertices with relationship edges

Microsoft Teams Chatbot Interface

Microsoft Teams Chatbot Interface

Recruiter-facing chatbot for submitting candidate search queries directly in Teams

Webhook Chatbot

Webhook Chatbot

Alternative webhook-based interface for programmatic search query submission

Back to All Projects