Skip to content

Semantic codebase understanding service — maps application structure, authentication patterns, authorization models, and data flows for intelligent security analysis.

Notifications You must be signed in to change notification settings

chasingimpact/appmap

Repository files navigation

AppMapper

Semantic codebase understanding for intelligent security analysis.

AppMapper builds deep, queryable knowledge of application structure — routes, authentication patterns, authorization models, data flows, and ownership semantics. It provides the contextual foundation that vulnerability scanners need to perform targeted, meaningful analysis instead of blind pattern matching.


What It Does

AppMapper indexes a codebase and answers structural questions about it:

Q: "Does this app have user registration?"
A: Yes. Registration is handled in AuthController.java:45 (POST /api/auth/register).
   Stores users in UserRepository with bcrypt password hashing.
   Email verification required via VerificationService.

Q: "List endpoints that don't require authentication"
A: Found 5 unauthenticated endpoints:
   1. POST /api/auth/login
   2. POST /api/auth/register
   3. GET  /api/public/products
   4. GET  /health
   5. GET  /metrics  (RISK: should be protected)

Q: "Show ownership patterns for the Order resource"
A: Order ownership validation:
   - Owner field: Order.userId
   - Validation: OrderService.java:78 checks order.getUserId().equals(currentUser.getId())
   - Missing in: GET /api/orders/{id} — potential IDOR

AppMapper is not a vulnerability scanner. It does not detect SQL injection, XSS, or command injection. It maps what the application is and does, so that specialized tools can scan with full context.


Architecture

┌──────────────────────────────────────────────────────────────┐
│                         AppMapper                            │
│                                                              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐ │
│  │  Parser   │──▶│ Enricher │──▶│ Indexer   │──▶│  Query   │ │
│  │ (AST)     │   │ (Rules)  │   │ (Vector)  │   │  Agent   │ │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘ │
│       │              │              │               │        │
│   Tree-sitter    Tag-based      ChromaDB        LLM-powered │
│   extraction     enrichment     embeddings      synthesis    │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐  │
│  │               Threat Modeling Engine                    │  │
│  │  Language profiles · Domain detection · Architecture   │  │
│  │  Story-driven analysis · Knowledge graph RAG           │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  Output: shared_context.json                                 │
└──────────────────────────────────────────────────────────────┘

Pipeline

  1. Parse — Tree-sitter AST extraction of code units (functions, classes, routes)
  2. Enrich — Rule-based tagging with semantic metadata (auth patterns, route handlers, data access)
  3. Index — Vector embeddings stored in ChromaDB for semantic search
  4. Query — Natural language questions resolved via semantic search + LLM synthesis
  5. Export — Structured shared_context.json for downstream tool consumption

Core Capabilities

Route & Endpoint Discovery

Extracts HTTP endpoints across frameworks — Spring Boot, Express, Flask, Django, FastAPI, Go net/http, and more. Each route includes path, method, handler location, auth requirements, and role restrictions.

Authentication Pattern Analysis

Identifies auth middleware, JWT validation, session management, OAuth integration points, and login/logout flows. Detects whether auth is cookie-based, token-based, or uses framework-specific mechanisms.

Authorization Model Mapping

Maps role definitions, permission hierarchies, access control decorators (@PreAuthorize, @login_required, etc.), and resource-to-role relationships.

Ownership & IDOR Detection Support

Discovers ownership validation patterns — which field links a resource to its owner, how the application verifies resource.owner == currentUser, and where those checks are missing.

Universal Threat Modeling

Language-aware, domain-aware, architecture-aware threat generation:

  • 16 universal threat categories (not limited to OWASP web risks)
  • 10 language profiles with memory safety and dangerous pattern data
  • 12 domain profiles (web API, image processing, cryptography, networking, embedded, etc.)
  • Story-driven analysis that understands business context ("What would an attacker want?")
  • Knowledge graph RAG grounded in OWASP Top 10 and CWE data

Data Flow Tracing

Traces how user input flows through the application — from HTTP parameters to database storage, identifying sanitization gaps and taint propagation.


Quick Start

Requirements

  • Python 3.10+
  • An Anthropic API key (for LLM-powered queries and threat modeling)

Installation

git clone https://github.com/chasingimpact/appmap.git
cd appmap
pip install -r requirements.txt

Configuration

Create a .env file in the project root:

ANTHROPIC_API_KEY=your-key-here

Running the Server

python run_server.py

The web UI launches at http://localhost:8000.


API Reference

Scan a Repository

curl -X POST http://localhost:8000/api/v2/scan-repo \
  -H "Content-Type: application/json" \
  -d '{"repo_path": "/path/to/repo"}'

Generate Shared Context

curl -X POST http://localhost:8000/api/v2/generate-shared-context \
  -H "Content-Type: application/json" \
  -d '{"repo_path": "/path/to/repo"}'

Export Shared Context to File

curl -X POST http://localhost:8000/api/v2/export-shared-context \
  -H "Content-Type: application/json" \
  -d '{"repo_path": "/path/to/repo"}'

Generate Threat Model

curl -X POST http://localhost:8000/api/v2/threat-model/generate \
  -H "Content-Type: application/json" \
  -d '{"repo_path": "/path/to/repo"}'

Classify Directories

curl -X POST http://localhost:8000/api/v2/classify-directories \
  -H "Content-Type: application/json" \
  -d '{"repo_path": "/path/to/repo"}'

Output Format

AppMapper produces a shared_context.json designed for consumption by downstream security tools:

{
  "scan_id": "...",
  "repo_path": "/path/to/repo",
  "primary_language": "java",
  "frameworks_detected": ["spring-boot"],
  "routes": [
    {
      "path": "/api/users/{id}",
      "method": "GET",
      "handler": "getUser",
      "auth_required": true,
      "roles": ["USER"],
      "ownership_check": "user.id == request.user.id"
    }
  ],
  "ownership_patterns": [
    {
      "resource": "Order",
      "owner_field": "userId",
      "validation_pattern": "order.getUserId().equals(currentUser.getId())"
    }
  ],
  "auth_enforcement": [
    {
      "type": "annotation",
      "name": "@PreAuthorize",
      "location": "controllers/*",
      "protects": ["admin endpoints"]
    }
  ],
  "unprotected_endpoints": ["/api/debug", "/metrics"],
  "is_multi_tenant": true,
  "tenant_isolation": {
    "field": "organizationId",
    "level": "MODERATE"
  }
}

Project Structure

src/
├── appmapper/
│   ├── service.py              # Core service orchestration
│   ├── route_scanner.py        # Multi-framework route extraction
│   ├── auth_discovery.py       # Authentication pattern detection
│   ├── directory_classifier.py # Directory purpose classification
│   ├── shared_context.py       # Structured context export
│   ├── query_agent.py          # LLM-powered natural language queries
│   ├── reachability.py         # Data flow reachability analysis
│   ├── dataflow_tracer.py      # Input-to-storage data flow tracing
│   ├── threat_modeling/        # Universal threat model engine
│   │   ├── models.py           # Threat types and data structures
│   │   ├── languages.py        # Language security profiles
│   │   ├── domains.py          # Domain threat profiles
│   │   ├── architectures.py    # Architecture risk profiles
│   │   ├── component_analyzer.py   # Codebase classification
│   │   ├── threat_enumerator.py    # Threat generation
│   │   ├── semantic_analyzer.py    # Story-driven analysis
│   │   ├── llm_validator.py        # LLM-based validation
│   │   └── knowledge_graph/        # OWASP/CWE RAG retrieval
│   └── ui/
│       ├── app.py              # Flask web application
│       └── templates/
├── parser/                     # Tree-sitter AST parsing
├── enricher/                   # Rule-based semantic enrichment
└── indexer/                    # ChromaDB vector indexing

Supported Frameworks

Language Frameworks
Java Spring Boot, Spring MVC, JAX-RS
JavaScript Express, Fastify, Koa, Hapi
TypeScript NestJS, Express
Python Flask, Django, FastAPI
Go net/http, Gin, Echo, Chi
PHP Laravel, Symfony
Ruby Rails, Sinatra
C# ASP.NET Core

License

Private repository. All rights reserved.

About

Semantic codebase understanding service — maps application structure, authentication patterns, authorization models, and data flows for intelligent security analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published