Back

Maples Group: KYC/AML Automation

Enterprise AI · Financial Services · McGill Capstone

The Story

Maples Group is a $480B+ offshore fund administrator managing investor relationships across 50+ offices in 20+ jurisdictions. Their Montreal office handled Know Your Customer (KYC) and Anti-Money Laundering (AML) compliance manually.

Every new investor triggered a 3-month process: documents reviewed by hand, investor names screened manually against multiple databases, risk assessments written from scratch. The compliance team flagged so many false positives:65% of alerts:that genuine risks got buried.

They asked McGill: Can AI automate this?

The Problem

  • Onboarding took 3 months per investor:bottleneck for growth
  • Manual screening created 65% false positive alerts:alert fatigue
  • No continuous monitoring:only at initial onboarding
  • Analyst hours burned on routine tasks instead of investigation
  • Regulatory risk exposure if compliance patterns weren't detected

The Solution

I led a 6-person team to build an AI-powered investor screening platform. Here is what we delivered:

5-Layer Architecture

Layer 1: Data Storage

Azure Data Lake Gen2 medallion architecture (Bronze/Silver/Gold) + Databricks + Delta Lake. Ingests investor documents, regulatory databases, and historical compliance data. Built for scalability.

Layer 2: Orchestration

11 Azure Data Factory pipelines: 1 event-driven (triggered on document upload), 5 daily, 4 weekly, 1 monthly for regulatory data refresh. Databricks notebooks for complex transformations.

Layer 3: AI & Risk Scoring

XGBoost for fraud prediction, Isolation Forest for anomaly detection, SHAP for explainability. Scores investors 0-100. Auto-routes by tier (high-risk → immediate review, low-risk → batch approval).

Layer 4: AI Agents

Three GPT-4 agents: document reader (extracts investor info from PDFs), sanctions screener (matches names across 6 regulatory databases with transliteration support), monitoring alerts (flags changes in investor status). Pinecone vector search for semantic matching.

Layer 5: UI & Security

React web app + Power BI dashboards for analysts. 11 Azure Functions (Managed Identity). Azure Key Vault for secrets. Azure AD RBAC + MFA for access control.

External Data Sources

Integrated 6 regulatory and financial databases: OpenSanctions, KnowYourCountry, NewsData.io, OpenCorporates, SEC EDGAR, Bluesky/Mastodon for reputation data.

The Impact

3mo → 10d
Onboarding time
65% → 32.5%
False positive rate
$1.39M
Annual savings

Financial Model (3-Year ROI)

  • Analyst time savings: $912K/year (600 onboardings/yr at $95/hr)
  • Alert triage efficiency: $315K/year (6,000 alerts/yr, 50% reduction in review time)
  • Regulatory risk reduction: $163K/year (fewer missed compliance issues)
  • Total investment cost: $1.53M (5-month payback)
  • 3-year net benefit: $3.66M
  • ROI: 287%

The system doesn't replace analysts:it amplifies them. Humans still make the final decision. The AI surfaces what matters and eliminates the noise.

The Build

My role: Data Strategist and Team Lead. I owned the entire project lifecycle:from initial client engagement through final delivery. My responsibilities spanned three areas: Leadership, Business Strategy, and Technical Architecture.

Leadership

I managed a 6-person team (data engineers, ML engineers, business analysts) across five months. I set the vision, removed blockers, and ensured we shipped on schedule. The team gave strong feedback about clarity of direction and their ability to make autonomous decisions within their domain. The client noted that our team felt aligned and professional:they knew exactly what we were building and why.

Business Strategy

I built the financial model from scratch. Started with: What does Maples spend on compliance today? How much time do analysts spend on routine work? What is the regulatory cost of missing a risk? From these questions, I modeled three scenarios:conservative, expected, optimistic:and settled on the expected case that showed $1.39M in annual savings. I then built the 12-month implementation roadmap that the client is now executing. That roadmap isn't theoretical; it's specific about phases, resource needs, and risk mitigation.

Technical Architecture

I co-architected the platform with one senior engineer. I made the key decisions: medallion architecture for data governance, Azure Data Factory for orchestration, XGBoost for scoring, GPT-4 agents for document intelligence. These weren't arbitrary:each choice reflected a trade-off. We chose Azure over AWS because Maples already had Azure skills. We chose medallion because it enforces data quality (Bronze → Silver → Gold). We chose explainability (SHAP) over raw accuracy because regulators care about why the system made a decision, not just that it was right.

Feedback

The project team noted that the vision was clear, the priorities were sensible, and the communication was direct. The client gave strong positive feedback on both the business case (they called it "grounded and realistic") and the technical approach (they said "this is exactly what we need, not over-engineered").

Tech Stack

Azure Data Lake
Databricks
Azure OpenAI
XGBoost
Pinecone
Azure Functions
React
Power BI
Python

Timeline

  • Phase 1 (0-2 months): Build core pipeline + document reader AI. Live with 100 test investors.
  • Phase 2 (2-4 months): Add risk scoring + sanctions screening. Integrate with analyst workflow.
  • Phase 3 (4-5 months): Continuous monitoring + dashboard + hand-off to operations.

Key Decisions

  • Built for the analyst, not against them. The UI shows confidence scores and reasoning so they can trust the system.
  • Prioritized regulatory data accuracy over speed. Better to be slower and right than fast and wrong.
  • Designed for continuous learning. The more alerts analysts triage, the more the model improves.

Learnings

  • Start with the analyst's pain: The client didn't want automation for automation's sake. They wanted their team to focus on investigation, not data entry. Design for the human problem first.
  • Financial models matter: The technical solution was interesting, but the $1.39M savings is what got buy-in. Data projects live or die on ROI.
  • Explainability is critical: With compliance, trust matters more than performance. A 95% accurate model that analysts can't explain fails. We focused on SHAP scores and audit trails.
  • Regulatory data is messy: Names change, spellings vary, jurisdictions differ. Transliteration and fuzzy matching are not optional.
  • Team structure is architecture: We organized the team by data layers (storage, orchestration, scoring, agents, UI). It made handoffs clear and accountability clean.

Delivered April 2026 · McGill Capstone · Team of 6

Back to portfolio