What sanctions lists does SanctionsWise API screen against?

SanctionsWise API screens against US Federal sanctions lists including OFAC SDN (Specially Designated Nationals), OFAC Consolidated, OFAC FSE (Foreign Sanctions Evaders), OFAC SSI (Sectoral Sanctions Identifications), and BIS Entity List.

How does the 5-layer matching engine work?

The 5-layer matching engine combines: (1) Exact Match for normalized name comparison, (2) Fuzzy Match using Jaro-Winkler and SequenceMatcher for typos, (3) Phonetic Match using Soundex and Double Metaphone for pronunciation variants, (4) Semantic Match using AWS Bedrock Titan embeddings for contextual similarity, and (5) Identifier Match for document number verification.

What is an evidence chain in SanctionsWise API?

Every match includes a complete evidence chain for compliance audits: source authority (e.g., US Treasury OFAC), source list ID, original entry UID, data timestamps, and match methodology details - essential for BSA/AML, OFAC, and regulatory compliance.

Matching Algorithms

SanctionsWise API uses a sophisticated 5-layer matching engine to achieve high accuracy while minimizing false positives.

Overview

The matching engine processes each query through five distinct layers, combining scores to produce a final confidence value:

Query Input
    │
    ├─► Layer 1: Exact Match (normalized)
    │
    ├─► Layer 2: Fuzzy Match (Jaro-Winkler + SequenceMatcher)
    │
    ├─► Layer 3: Phonetic Match (Soundex + Double Metaphone)
    │
    ├─► Layer 4: Semantic Match (Bedrock Titan + S3 Vectors)
    │
    └─► Layer 5: Identifier Match (passport, tax ID, etc.)

    ↓
Combined Score → Confidence Threshold → Match Decision

Layer 1: Exact Match

What it does: Compares normalized versions of names character-by-character.

Normalization process:

Convert to uppercase
Remove titles (Mr., Dr., Sheikh, etc.)
Remove special characters
Collapse multiple spaces

Example:

Query:    "Dr. Vladimir V. Putin"
Normalized: "VLADIMIR V PUTIN"

Entity:   "VLADIMIR V PUTIN"
Result:   Exact match (score: 1.0)

Best for: Known exact name matches, repeat screenings

Layer 2: Fuzzy Match

What it does: Detects near-matches using string similarity algorithms.

Jaro-Winkler Similarity (40% weight)

Measures the minimum number of single-character transpositions required to change one string into another. Gives extra weight to matching prefixes.

Example:

Query:  "MARTHA"
Entity: "MARHTA"  (transposition of 'H' and 'T')
Score:  0.944

SequenceMatcher (30% weight)

Finds the longest contiguous matching subsequence, good for partial name matches.

Example:

Query:  "JOHN SMITH"
Entity: "JOHN ALEXANDER SMITH"
Score:  0.71 (shares "JOHN" and "SMITH" subsequences)

Combined Fuzzy Score

fuzzy_score = (0.40 * jaro_winkler) + (0.30 * sequence_ratio) + (0.30 * phonetic)

Best for: Typos, transpositions, partial names, data entry errors

Layer 3: Phonetic Match (30% weight)

What it does: Matches names that sound alike but are spelled differently.

Soundex Algorithm

Classic phonetic algorithm encoding names by sound. Returns a 4-character code: first letter + 3 digits.

How it works:

Keep first letter
Map consonants to digits (B,F,P,V → 1; C,G,J,K,Q,S,X,Z → 2; etc.)
Remove adjacent duplicates
Pad/truncate to 4 characters

Example:

Name       | Soundex
-----------|--------
ROBERT     | R163
RUPERT     | R163
SMITH      | S530
SMYTHE     | S530

Double Metaphone Algorithm

Advanced phonetic algorithm returning two codes:

Primary code: Most common pronunciation
Alternate code: Non-English variants (Germanic, Slavic patterns)

Example:

Name      | Primary | Alternate
----------|---------|----------
SCHMIDT   | XMT     | SMT
TCHAIKOVSKY | XKFSK | TKFSK

Phonetic Scoring

phonetic_score = (
    0.4 * (soundex_match) +
    0.4 * (metaphone_primary_match) +
    0.2 * (metaphone_alternate_match)
)

Real Example:

Query:  "Vladimer Pootin"  (common misspellings)
Entity: "VLADIMIR PUTIN"

Soundex:     V435 vs V435 → Match!
Metaphone:   FLTMR vs FLTMR → Match!
Phonetic Score: 0.60

Best for: Transliteration variants, accent variations, spelling errors based on pronunciation

Layer 4: Semantic Match (S3 Vectors + Bedrock Titan)

What it does: Uses AI embeddings to find contextually similar names, even with significant textual differences.

How It Works

Embedding Generation: Query name is converted to a 1024-dimensional vector using Amazon Bedrock Titan Text Embeddings v2
Vector Search: S3 Vectors performs approximate nearest neighbor search against indexed entity embeddings
Similarity Scoring: Cosine similarity measures contextual alignment

Architecture:

Query: "Russian President Putin"
    │
    ├─► Bedrock Titan Embed → [0.12, -0.45, 0.78, ...] (1024 dims)
    │
    └─► S3 Vectors Query
            │
            ├─► "VLADIMIR VLADIMIROVICH PUTIN" → similarity: 0.89
            ├─► "PUTIN, VLADIMIR VLADIMIROVICH" → similarity: 0.87
            └─► "VLADIMIR POTANIN" → similarity: 0.45

Semantic vs Traditional Matching

Query	Traditional Match	Semantic Match
"Bank of Russia"	May miss "CENTRAL BANK OF THE RUSSIAN FEDERATION"	Finds it (similar context)
"Russian oligarch Abramovich"	Requires exact name	Finds "ROMAN ARKADYEVICH ABRAMOVICH"
"Kim Jong-un's regime"	No match	Finds related DPRK entities

Best for: Contextual queries, alias discovery, cross-language matches, descriptive queries

Layer 5: Identifier Match

What it does: Matches on document numbers (passport, tax ID, etc.) with normalization and fuzzy matching.

Supported Identifier Types

Type	Aliases	Example
passport	passport_number, travel_document	AB1234567
national_id	ssn, id_number	123-45-6789
tax_id	tin, ein, vat_number	98-7654321
registration	company_number, business_registration	12345678

Matching Modes

Exact Match:

Query:    {"passport": "AB123456"}
Entity:   {"passport": "AB-123-456"}
Normalized: Both → "AB123456"
Result:   Exact match (confidence: 1.0)

Partial Match: (for long identifiers ≥8 chars)

Query:    {"passport": "AB123456789"}
Entity:   {"passport": "AB12345678901234"}
Result:   Partial match (confidence: 0.9)

Identifier Bonus

When an identifier matches, the overall confidence score receives a +15% bonus:

if identifier_matched:
    final_score = min(base_score + 0.15, 1.0)
    match_type = "identifier_confirmed"

Best for: KYC verification, document-based screening, high-confidence matches

Combined Scoring Formula

The final confidence score combines all matching layers:

# Base score from fuzzy matching
base_score = (
    0.40 * jaro_winkler_similarity +
    0.30 * sequence_matcher_ratio +
    0.30 * phonetic_similarity
)

# Entity type bonus (if types match)
if query_type == entity_type:
    base_score *= 1.05  # +5%

# Identifier match bonus
if identifier_matched:
    base_score += 0.15  # +15%

# Semantic enhancement
if semantic_match:
    final_score = (base_score * 0.80) + (semantic_score * 0.20)
else:
    final_score = base_score

Configuration Reference

All matching weights are configurable:

Parameter	Default	Description
MatchingWeightJaro	0.40	Jaro-Winkler weight
MatchingWeightSequence	0.30	SequenceMatcher weight
MatchingWeightPhonetic	0.30	Phonetic similarity weight
MatchingTypeBonus	0.05	Entity type match bonus
MatchingIdentifierBonus	0.15	Identifier match bonus
SemanticWeight	0.20	Semantic score weight

Performance Characteristics

Metric	Value	Notes
Single entity (warm)	~10ms	With entity cache
Single entity (cold)	~1000ms	First request
Batch (100 entities)	~500ms	Amortized loading
Semantic search	+200ms	Bedrock + S3 Vectors
Entities per second	100+	Batch mode

Best Practices

Set appropriate thresholds:
- 0.95+ for automated pass-through
- 0.85 for standard screening
- 0.70 for enhanced due diligence

Use entity types when known:

{"name": "Acme Corp", "entity_type": "organization"}

Include identifiers for KYC:

{
  "name": "John Smith",
  "identifiers": {"passport": "AB123456"}
}

Use batch endpoint for bulk screening
Monitor match types in responses:
- exact: Direct match
- fuzzy: Near match
- fuzzy+semantic: AI-enhanced
- identifier_confirmed: Document verified

For API integration details, see the API Reference

Overview​

Layer 1: Exact Match​

Layer 2: Fuzzy Match​

Jaro-Winkler Similarity (40% weight)​

SequenceMatcher (30% weight)​

Combined Fuzzy Score​

Layer 3: Phonetic Match (30% weight)​

Soundex Algorithm​

Double Metaphone Algorithm​

Phonetic Scoring​

Layer 4: Semantic Match (S3 Vectors + Bedrock Titan)​

How It Works​

Semantic vs Traditional Matching​

Layer 5: Identifier Match​

Supported Identifier Types​

Matching Modes​

Identifier Bonus​

Combined Scoring Formula​

Configuration Reference​

Performance Characteristics​

Best Practices​

Overview

Layer 1: Exact Match

Layer 2: Fuzzy Match

Jaro-Winkler Similarity (40% weight)

SequenceMatcher (30% weight)

Combined Fuzzy Score

Layer 3: Phonetic Match (30% weight)

Soundex Algorithm

Double Metaphone Algorithm

Phonetic Scoring

Layer 4: Semantic Match (S3 Vectors + Bedrock Titan)

How It Works

Semantic vs Traditional Matching

Layer 5: Identifier Match

Supported Identifier Types

Matching Modes

Identifier Bonus

Combined Scoring Formula

Configuration Reference

Performance Characteristics

Best Practices