docs(networking): add network architecture overview#422
docs(networking): add network architecture overview#422IvanHunters wants to merge 1 commit intomainfrom
Conversation
Add comprehensive documentation covering the Cozystack networking stack: MetalLB load balancing (L2 and BGP modes), Cilium eBPF as kube-proxy replacement, Kube-OVN centralized IPAM, and tenant isolation with identity-based eBPF policies. All diagrams use Mermaid. Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary of ChangesHello @IvanHunters, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces new documentation that provides a detailed overview of the Cozystack cluster's network architecture. It clarifies how various components like MetalLB, Cilium eBPF, and Kube-OVN work together to manage external load balancing, internal pod networking, and robust tenant isolation. The document aims to enhance understanding of the system's networking capabilities and security enforcement mechanisms. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughA new comprehensive documentation file is added describing Cozystack's network architecture, covering multi-layered networking with MetalLB for external load balancing, Cilium eBPF for service load balancing and network policy enforcement, Kube-OVN for pod networking, and Hubble for observability. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive and well-structured documentation page for Cozystack's network architecture. The document is clear, detailed, and makes excellent use of Mermaid diagrams to explain complex concepts like MetalLB modes, Cilium's eBPF-based processing, and tenant isolation. The explanations are accurate and easy to follow. I have a couple of minor suggestions to improve the clarity of two diagrams, but overall this is a great addition to the documentation.
| flowchart LR | ||
| A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"} | ||
| CHECK -->|"Cross-tenant"| DENY["DENY"] | ||
| CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod A'"] |
There was a problem hiding this comment.
In the "Tenant Isolation" summary diagram, the label ALLOW → Pod A' could be clearer. The A' notation is ambiguous and might be confused with a different state of Pod A. To improve clarity, consider changing it to explicitly state that traffic is allowed to another pod within the same tenant.
| flowchart LR | |
| A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"} | |
| CHECK -->|"Cross-tenant"| DENY["DENY"] | |
| CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod A'"] | |
| flowchart LR | |
| A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"} | |
| CHECK -->|"Cross-tenant"| DENY["DENY"] | |
| CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod in same tenant"] |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 277-292: The sentence "All of this happens in kernel space in
approximately 100 nanoseconds." is an unsupported precise latency claim; update
the text in the "Policy Enforcement in Kernel" section to either remove the
numeric value or qualify it and add a citation: e.g., replace with a softened
statement such as "All of this happens in kernel space and is typically
performed in sub-microsecond time on modern hardware" or "…in approximately 100
nanoseconds (hardware- and version-dependent; see [benchmark/source])" and
include a reference to the benchmark or paper if you keep the number; locate the
exact sentence in that section to edit.
🧹 Nitpick comments (1)
content/en/docs/v1/networking/architecture.md (1)
294-316: Avoid absolute security guarantees; qualify the statements.Phrases like “No userspace bypass” / “no race conditions” / “cannot be bypassed” read as unconditional guarantees. Consider qualifying them (e.g., “by design” or “under correct configuration”) to avoid over-promising.
✏️ Suggested wording
-| **No userspace bypass** | All network traffic must pass through eBPF hooks | -| **Atomic updates** | Policy changes are atomic — no race conditions | +| **No userspace bypass (by design)** | All network traffic is expected to pass through eBPF hooks under correct configuration | +| **Atomic updates** | Policy updates are applied atomically to reduce race windows |- EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"] + EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Not intended to be bypassed by userspace (with correct configuration)<br/>• Atomic policy updates"]
| ### Policy Enforcement in Kernel | ||
|
|
||
| When a packet is sent between pods, Cilium enforces policies entirely within kernel space: | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| PKT["Packet: 10.244.0.10 → 10.244.1.20"] | ||
| STEP1["1. Lookup source identity:<br/>10.244.0.10 → ID 12345 (tenant-a)"] | ||
| STEP2["2. Lookup destination identity:<br/>10.244.1.20 → ID 67890 (tenant-b)"] | ||
| STEP3["3. Check policy map:<br/>(12345, 67890, TCP, 80) → DENY"] | ||
| DROP["4. DROP packet"] | ||
| PKT --> STEP1 --> STEP2 --> STEP3 --> DROP | ||
| ``` | ||
|
|
||
| All of this happens in kernel space in approximately 100 nanoseconds. |
There was a problem hiding this comment.
Soften or source the “~100 nanoseconds” performance claim.
This is a very specific latency figure and is likely hardware/version dependent. Consider removing the number, qualifying it, or citing a benchmark if you have one.
✏️ Suggested wording
-All of this happens in kernel space in approximately 100 nanoseconds.
+All of this happens in kernel space with very low per-packet overhead (exact latency depends on hardware, kernel, and policy complexity).🤖 Prompt for AI Agents
In `@content/en/docs/v1/networking/architecture.md` around lines 277 - 292, The
sentence "All of this happens in kernel space in approximately 100 nanoseconds."
is an unsupported precise latency claim; update the text in the "Policy
Enforcement in Kernel" section to either remove the numeric value or qualify it
and add a citation: e.g., replace with a softened statement such as "All of this
happens in kernel space and is typically performed in sub-microsecond time on
modern hardware" or "…in approximately 100 nanoseconds (hardware- and
version-dependent; see [benchmark/source])" and include a reference to the
benchmark or paper if you keep the number; locate the exact sentence in that
section to edit.
Summary
Test plan
Summary by CodeRabbit