diff --git a/README.adoc b/README.adoc index 8b13789..1f6423a 100644 --- a/README.adoc +++ b/README.adoc @@ -1 +1,311 @@ += Conative Gating +Jonathan D.A. Jewell +:toc: macro +:toc-title: Contents +:toclevels: 3 +:sectnums: +:icons: font +:source-highlighter: rouge +:experimental: +:repo: https://github.com/hyperpolymath/conative-gating +SLM-as-Cerebellum for LLM Policy Enforcement + +[.lead] +A biologically-inspired system where a Small Language Model acts as an *inhibitory antagonist* to Large Language Models, preventing policy violations through mechanisms analogous to the basal ganglia's GO/NO-GO decision system. + +toc::[] + +== The Problem + +LLMs are trained to be helpful, which makes them systematically violate explicit project constraints. When given rules like "NEVER use TypeScript, use ReScript", LLMs: + +1. Read and acknowledge the constraint +2. Generate compliant-sounding justification +3. Violate the constraint anyway + +This happens because: + +* Common languages (TypeScript, Python) dominate training data +* The "helpfulness drive" overrides explicit instructions +* LLMs lack true "loss aversion" for policy violations + +Documentation-based enforcement fails because LLMs "engage with" documentation rather than *obey* it. + +== The Solution + +Conative Gating introduces a second model trained with *inverted incentives*: + +[cols="1,1,1"] +|=== +| Component | Role | Analogy + +| *LLM* +| Task execution (helpful, creative) +| Frontal cortex / Direct pathway ("GO") + +| *SLM* +| Policy enforcement (adversarial, suspicious) +| Cerebellum / Indirect pathway ("NO-GO") + +| *Policy Oracle* +| Deterministic rule checking +| Reflex arc (fast, no ML) + +| *Consensus Arbiter* +| Weighted decision making +| Thalamus (integration) +|=== + +=== Key Innovation + +Using *consensus protocols with asymmetric weighting* - the SLM's votes count 1.5x the LLM's, creating a natural bias toward inhibition that counters the LLM's tendency toward helpfulness. + +== Architecture + +---- + USER REQUEST + | + v + +------------------------+ + | CONTEXT ASSEMBLY | + +------------------------+ + | + +--------------+--------------+ + | | + v v + +-------------+ +---------------+ + | LLM | | SLM | + | (Proposer) | | (Adversarial) | + +------+------+ +-------+-------+ + | | + +-------------+---------------+ + | + v + +------------------------+ + | CONSENSUS ARBITER | + | (Modified PBFT) | + | SLM weight: 1.5x | + +------------------------+ + | + +-------------+-------------+ + | | | + v v v + +-------+ +--------+ +-------+ + | ALLOW | |ESCALATE| | BLOCK | + +-------+ +--------+ +-------+ +---- + +=== Three-Tier Evaluation + +[horizontal] +Policy Oracle (Rust):: Deterministic rule checking - forbidden languages, toolchain rules, security patterns. Fast, no ML needed. + +SLM Evaluator (Rust + llama.cpp):: Detects "spirit violations" - technically compliant but violates intent. Catches verbosity, meta-commentary bloat. + +Consensus Arbiter (Elixir/OTP):: Modified PBFT with asymmetric weighting. Three outcomes: ALLOW, ESCALATE, BLOCK. + +== Installation + +=== From Source + +[source,bash] +---- +git clone https://github.com/hyperpolymath/conative-gating +cd conative-gating +cargo build --release +---- + +=== Usage + +[source,bash] +---- +# Scan a directory for policy violations +conative scan ./my-project + +# Check a single file +conative check --file src/main.ts + +# Check inline content +conative check --content "const x: string = 'hello'" + +# Show current policy +conative policy + +# Initialize local configuration +conative init + +# JSON output for automation +conative scan . --format json +---- + +=== Exit Codes + +[cols="1,3"] +|=== +| Code | Meaning + +| 0 | Compliant - all checks passed +| 1 | Hard violation detected (blocked) +| 2 | Soft concern detected (warning) +| 3 | Error during execution +|=== + +== Default Policy (RSR) + +The default policy implements the Rhodium Standard Repository (RSR) language hierarchy: + +=== Tier 1 - Preferred + +* Rust, Elixir, Zig, Ada, Haskell, ReScript + +=== Tier 2 - Acceptable (generates warnings) + +* Nickel, Racket + +=== Forbidden + +* TypeScript, Python*, Go, Java + +[NOTE] +==== +*Python exception: Allowed in `salt/` directories for SaltStack and `training/` for ML training scripts. +==== + +=== Toolchain Rules + +* `npm` requires `deno.json` (no npm without Deno) + +=== Security Patterns + +* Detects hardcoded secrets (passwords, API keys) + +== Configuration + +Initialize local configuration: + +[source,bash] +---- +conative init +---- + +This creates `.conative/policy.ncl` using Nickel for type-safe configuration: + +[source,nickel] +---- +{ + name = "My Project Policy", + languages = { + tier1 = [...], + forbidden = [...], + exceptions = [ + { language = "python", allowed_paths = ["scripts/"], reason = "Build scripts" } + ] + }, + enforcement = { + slm_weight = 1.5, + escalate_threshold = 0.4, + block_threshold = 0.7, + } +} +---- + +== Decision Matrix + +[cols="1,1,1"] +|=== +| LLM Confidence | SLM Violation Score | Result + +| High (>0.8) | Low (<0.3) | *ALLOW* +| High (>0.8) | Med (0.3-0.6) | ESCALATE +| High (>0.8) | High (>0.6) | *BLOCK* +| Med (0.5-0.8) | Any >0.4 | ESCALATE +| Low (<0.5) | Any | ESCALATE +|=== + +== Project Structure + +---- +conative-gating/ + src/ + main.rs # CLI application + oracle/ # Policy Oracle crate (Rust) + slm/ # SLM Evaluator crate (Rust) + config/ + policy.ncl # Default policy (Nickel) + schema.ncl # Policy schema + training/ + compliant/ # Examples that should pass + violations/ # Examples that should fail + edge_cases/ # Spirit violations for SLM + docs/ + ARCHITECTURE.md # Full design specification + *.adoc # Integration documentation +---- + +== Integration + +=== Claude Code Hook + +[source,json] +---- +{ + "hooks": { + "pre-commit": "conative scan --strict" + } +} +---- + +=== Pre-commit Hook + +[source,yaml] +---- +repos: + - repo: local + hooks: + - id: conative-gating + name: Conative Policy Check + entry: conative scan + language: system + pass_filenames: false +---- + +=== Programmatic Validation + +[source,bash] +---- +# Validate structured proposals +conative validate proposal.json --strict +---- + +Proposal format: + +[source,json] +---- +{ + "id": "uuid", + "action_type": {"CreateFile": {"path": "src/util.rs"}}, + "content": "file contents here", + "files_affected": ["src/util.rs"], + "llm_confidence": 0.95 +} +---- + +== Related Projects + +* *NeuroPhone* - Neurosymbolic phone AI (integrates Conative Gating) +* *ECHIDNA* - Multi-prover orchestration (SLM as another "prover") +* *RSR Framework* - Rhodium Standard Repository specifications +* *Axiom.jl* - Provable Julia ML (future formal verification) + +== License + +SPDX-License-Identifier: AGPL-3.0-or-later + +Copyright (C) 2025 Jonathan D.A. Jewell + +== References + +* link:docs/ARCHITECTURE.md[Full Architecture Specification] +* link:docs/MAAF_INTEGRATION.adoc[MAAF Integration] +* link:docs/STATE_ECOSYSTEM_SCHEMA.adoc[STATE/ECOSYSTEM Schema] diff --git a/ROADMAP.adoc b/ROADMAP.adoc new file mode 100644 index 0000000..07b8c2a --- /dev/null +++ b/ROADMAP.adoc @@ -0,0 +1,306 @@ += Conative Gating Roadmap +:toc: +:toc-title: Phases +:toclevels: 2 +:icons: font + +Development roadmap for the SLM-as-Cerebellum policy enforcement system. + +== Current Status + +[.lead] +*Phase 1 Complete* - Policy Oracle implemented with full CLI + +[cols="1,3,1"] +|=== +| Component | Status | Notes + +| Policy Oracle (Rust) +| [green]#*COMPLETE*# +| Deterministic rule checking working + +| CLI Tool +| [green]#*COMPLETE*# +| scan, check, validate, init, completions + +| Nickel Configuration +| [green]#*COMPLETE*# +| Type-safe policy schema + +| Training Data Structure +| [green]#*COMPLETE*# +| compliant/violations/edge_cases + +| SLM Evaluator +| [yellow]#*PLACEHOLDER*# +| Interface defined, needs llama.cpp + +| Consensus Arbiter +| [red]#*NOT STARTED*# +| Elixir/OTP implementation pending + +| LLM Integration +| [red]#*NOT STARTED*# +| Depends on deployment context +|=== + +== Phase 1: Policy Oracle [green]#COMPLETE# + +=== Deliverables + +[%interactive] +* [x] Core data types (Proposal, PolicyVerdict, ViolationType) +* [x] Language tier system (Tier 1/Tier 2/Forbidden) +* [x] Exception rules (Python in salt/, training/) +* [x] Toolchain rules (npm requires deno.json) +* [x] Forbidden pattern detection (hardcoded secrets) +* [x] Directory scanning with intelligent filtering +* [x] CLI with multiple output formats (text, json, compact) +* [x] Shell completion generation (bash, zsh, fish) +* [x] Man page generation +* [x] Nickel policy configuration +* [x] Unit tests for core functionality + +=== CLI Commands + +[source] +---- +conative scan # Scan directory tree +conative check --file # Check single file +conative check --content # Check inline content +conative policy # Display policy +conative validate # Validate JSON proposal +conative init # Initialize .conative/ +conative completions # Generate completions +conative man # Generate man page +---- + +== Phase 2: SLM Evaluator [yellow]#IN PROGRESS# + +=== Goals + +Implement neural "spirit violation" detection using a fine-tuned Small Language Model. + +=== Tasks + +[%interactive] +* [x] Define SlmEvaluation struct and interface +* [ ] Integrate llama.cpp Rust bindings +* [ ] Load GGUF model files +* [ ] Implement prompt engineering for policy detection +* [ ] Add confidence scoring +* [ ] Benchmark latency on target hardware +* [ ] Create test suite for spirit violations + +=== Model Selection + +[cols="1,1,2"] +|=== +| Model | Parameters | Use Case + +| Phi-3-mini +| 3.8B +| Primary candidate - good balance of speed/quality + +| Gemma-2B +| 2B +| Faster, lower quality - mobile/edge + +| Phi-3-small +| 7B +| Higher quality - desktop/server +|=== + +=== Spirit Violations to Detect + +* README bloat with meta-framework commentary +* Verbosity smell (over-explanation) +* Technically compliant but intent-violating code +* "Helpful" additions that weren't requested +* Suspicious pattern deviations + +== Phase 3: Training Pipeline + +=== Goals + +Build dataset and fine-tune SLM for adversarial policy detection. + +=== Tasks + +[%interactive] +* [ ] Expand training data (target: 1000+ examples) +* [ ] Balance dataset (violations vs compliant) +* [ ] Implement QLoRA fine-tuning pipeline +* [ ] Create validation holdout set +* [ ] Implement weighted loss function: + ** violation_detected: 2.0 + ** violation_missed: 3.0 + ** false_positive: 0.5 +* [ ] Evaluate precision/recall tradeoffs +* [ ] Export to GGUF format + +=== Training Data Structure + +---- +training/ + compliant/ # Should pass - Tier 1 languages, proper toolchain + violations/ # Hard violations - forbidden languages, secrets + edge_cases/ # Spirit violations - needs SLM judgment +---- + +== Phase 4: Consensus Arbiter + +=== Goals + +Implement modified PBFT consensus with asymmetric weighting in Elixir/OTP. + +=== Tasks + +[%interactive] +* [ ] Set up Elixir/OTP project structure +* [ ] Implement GenServer for arbiter +* [ ] Define consensus protocol messages +* [ ] Implement asymmetric weighting (SLM = 1.5x) +* [ ] Add escalation logic +* [ ] Implement audit logging +* [ ] Create supervision tree +* [ ] Add Rustler NIFs for Oracle/SLM calls +* [ ] Write property-based tests + +=== Decision Thresholds + +[cols="1,1,1"] +|=== +| Threshold | Default | Purpose + +| slm_weight | 1.5 | SLM vote multiplier +| escalate_threshold | 0.4 | When to ask human +| block_threshold | 0.7 | When to reject outright +|=== + +== Phase 5: Integration + +=== Claude Code Integration + +[%interactive] +* [ ] Define hook interface +* [ ] Implement pre-commit hook +* [ ] Create real-time evaluation mode +* [ ] Add Claude Code config schema +* [ ] Write integration tests + +=== NeuroPhone Integration + +[%interactive] +* [ ] Port to mobile-compatible format +* [ ] Optimize for Oppo Reno 13 / edge devices +* [ ] Integrate with reservoir computing layer +* [ ] Add feedback loop for continuous learning + +=== ECHIDNA Integration + +[%interactive] +* [ ] Define SLM as "prover" interface +* [ ] Implement proof certificate generation +* [ ] Add to multi-prover orchestration + +== Phase 6: Experimentation + +=== Research Questions + +[cols="1,2"] +|=== +| Question | Approach + +| Optimal SLM size +| Benchmark Phi-3-mini vs Gemma-2B vs Phi-3-small + +| Training data volume +| Learning curves with 100/500/1000 examples + +| Asymmetry calibration +| Vary weight 1.0-2.0, measure precision/recall + +| Spirit detection accuracy +| Human evaluation of SLM judgments + +| Latency budget +| Profile end-to-end decision time + +| Adversarial robustness +| Red team: can LLM fool SLM? + +| Cross-project generalization +| Test SLM trained on one policy against others +|=== + +=== Experiments + +1. *Violation Detection Accuracy* + - 500 proposals (250 violations, 250 compliant) + - Baseline: GPT-4 with policy in context + - Test: Fine-tuned Phi-3-mini + +2. *Latency Impact* + - Measure with/without SLM gating + - Target: <500ms for interactive use + +3. *Weight Optimization* + - Pareto frontier: false positives vs missed violations + +== Future Directions + +=== Chapel Orchestrator (Optional) + +Alternative to Elixir for parallel model evaluation: + +[source,chapel] +---- +cobegin { + var llmResult = evaluateLLM(proposal); + var slmResult = evaluateSLM(proposal, policy); + var oracleResult = evaluateOracle(proposal, policy); +} +---- + +=== Axiom.jl Integration + +Provable policy checking when Axiom.jl is ready: + +[source,julia] +---- +@axiom PolicyCompliance begin + @ensure !contains_forbidden_language(proposal, policy) + @prove compliant(proposal) ∨ violation_reported(proposal) +end +---- + +=== Continuous Learning + +Feedback loop for ongoing improvement: + +* Record decision outcomes +* Retrain SLM on misclassifications +* Adapt thresholds based on project-specific data + +== Version History + +[cols="1,1,2"] +|=== +| Version | Date | Changes + +| 0.1.0 +| 2025-12 +| Initial release - Policy Oracle + CLI +|=== + +== Contributing + +See CONTRIBUTING.md for development setup and guidelines. + +Priority areas for contribution: + +1. Training data examples (especially edge cases) +2. SLM fine-tuning experiments +3. Elixir/OTP Consensus Arbiter +4. Integration hooks for other editors/tools