-
Notifications
You must be signed in to change notification settings - Fork 675
[Feature] add golang router #5756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
longsiyi seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a high-performance Golang router implementation for FastDeploy, providing flexible request routing with support for both disaggregated (splitwise) and mixed deployment modes.
Key changes:
- Implements a complete Golang-based HTTP router with middleware support (CORS, logging, recovery, metrics)
- Adds multiple scheduling strategies (random, round_robin, power_of_two, process_tokens, request_num) for load balancing
- Provides health monitoring and automatic worker registration/deregistration
- Includes comprehensive unit tests with good coverage
Reviewed changes
Copilot reviewed 44 out of 45 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/main.go | Main entry point that initializes config, logger, manager, scheduler, and starts the HTTP server |
| internal/config/config.go | Configuration loading with YAML support and default value handling |
| internal/gateway/completions.go | Core routing logic for chat completions, handles P/D coordination and request forwarding |
| internal/manager/*.go | Worker registration, health checks, and instance management |
| internal/scheduler/handler/*.go | Multiple scheduling algorithms for worker selection |
| internal/middleware/*.go | HTTP middleware for CORS, logging, recovery, and metrics collection |
| internal/router/router.go | Gin router setup with API routes and middleware configuration |
| pkg/logger/logger.go | Custom logging implementation with level-based filtering |
| pkg/metrics/metrics.go | Prometheus metrics collection for monitoring |
| run.sh, build.sh, Makefile | Build and deployment scripts |
| README.md, README_CN.md | Documentation in English and Chinese |
| } | ||
|
|
||
| if err := scanner.Err(); err != nil { | ||
| fmt.Println("scanner error ", err) |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fmt.Println call for error handling is inconsistent with the rest of the codebase which uses the logger package. This error should be logged using logger.Error for consistency and proper log management.
| fmt.Println("scanner error ", err) | |
| logger.Error("scanner error ", err) |
| @@ -0,0 +1,108 @@ | |||
| # fd-router | |||
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title should follow the format specified in the checklist. According to the PR description checklist, it should be "[Feature] Add golang router" instead of "[Feature] add golang router" (capitalize "Add").
| func Init(logLevel, output string) { | ||
| level = logLevel | ||
|
|
||
| flags := log.LstdFlags | log.Lshortfile | ||
|
|
||
| if output == "file" { | ||
| // Check if logs directory exists | ||
| if _, err := os.Stat("logs"); os.IsNotExist(err) { | ||
| if err := os.MkdirAll("logs", 0755); err != nil { | ||
| log.Fatalln("Failed to create logs directory:", err) | ||
| } | ||
| } | ||
| file, err := os.OpenFile("logs/router.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666) | ||
| if err != nil { | ||
| log.Fatalln("Failed to open log file:", err) | ||
| } | ||
| infoLogger = log.New(file, "[INFO] ", flags) | ||
| errorLogger = log.New(file, "[ERROR] ", flags) | ||
| warnLogger = log.New(file, "[WARN] ", flags) | ||
| debugLogger = log.New(file, "[DEBUG] ", flags) | ||
| } else { | ||
| infoLogger = log.New(os.Stdout, "[INFO] ", flags) | ||
| errorLogger = log.New(os.Stderr, "[ERROR] ", flags) | ||
| warnLogger = log.New(os.Stdout, "[WARN] ", flags) | ||
| debugLogger = log.New(os.Stdout, "[DEBUG] ", flags) | ||
| } | ||
| } |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger initialization uses global package-level variables that are not safe for concurrent access during initialization. If multiple goroutines call Init concurrently, it could lead to race conditions. Consider using sync.Once to ensure Init is called only once, or document that Init must be called before any concurrent operations.
| c.Writer.Header().Set("Access-Control-Allow-Origin", "*") | ||
| c.Writer.Header().Set("Access-Control-Allow-Credentials", "true") |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CORS middleware allows all origins with "" which is a security risk in production. When Access-Control-Allow-Credentials is set to "true", the origin should not be "" according to CORS specifications. This configuration will be rejected by modern browsers. Either remove credentials support or restrict allowed origins to specific domains.
| } | ||
| } | ||
|
|
||
| client := &http.Client{} |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTTP client is created without any timeout configuration. This means requests can hang indefinitely if the backend service doesn't respond. Set appropriate timeouts using client.Timeout to prevent resource exhaustion and improve reliability.
| source := rand.NewSource(time.Now().UnixNano()) | ||
| r := rand.New(source) | ||
| randomNum := r.Intn(len(workers)) | ||
| return workers[randomNum], nil |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a new random source with rand.NewSource(time.Now().UnixNano()) in each call is inefficient and can produce predictable results when called in rapid succession with the same nanosecond timestamp. Consider using a package-level properly seeded random generator or the global rand functions which are already seeded.
| source := rand.NewSource(time.Now().UnixNano()) | ||
| r := rand.New(source) | ||
| length := len(workers) | ||
| randomNum1 := r.Intn(length) | ||
| randomNum2 := r.Intn(length) |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same performance issue exists here: creating a new random source on every call is inefficient. Additionally, creating a new random source with the same timestamp in PowerOfTwoSelectWorker (lines 19-20) can cause both random numbers to be identical or very similar when called multiple times rapidly, reducing the effectiveness of the power-of-two selection algorithm.
| file, err := os.OpenFile("logs/router.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666) | ||
| if err != nil { | ||
| log.Fatalln("Failed to open log file:", err) | ||
| } |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential resource leak if the logger file fails to open. The function calls log.Fatalln which will exit the program, but if the file was successfully created, it won't be properly closed. Consider using defer or handling the error more gracefully to allow proper cleanup.
|
|
||
| func SelectWorker(ctx context.Context, message string) (string, error) { | ||
| workers := WorkerMapToList(ctx, "mixed") | ||
| selecteWorkerURL, err := scheduler_handler.SelectWorker(ctx, workers, message, "mixed") |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in variable name: selecteWorkerURL should be selectedWorkerURL (missing 'd').
| selectePrefillWorkerURL, err := scheduler_handler.SelectWorker(ctx, prefillWorkers, message, "prefill") | ||
| if err != nil { | ||
| return "", "", err | ||
| } | ||
| selecteDecodeWorkerURL, err := scheduler_handler.SelectWorker(ctx, decodeWorkers, message, "decode") |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in variable name: selectePrefillWorkerURL and selecteDecodeWorkerURL should be selectedPrefillWorkerURL and selectedDecodeWorkerURL (missing 'd').
| @@ -0,0 +1,111 @@ | |||
| # fd-router | |||
|
|
|||
| 一个高性能的 Go 语言路由框架,提供灵活的请求路由、中间件支持和健康检查功能。 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以说明下正在开发迭代中
| 1. 复制配置文件模板并修改: | ||
|
|
||
| ```bash | ||
| cp config/config.example.yaml config/config.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config目录在examples下面吧, 这层级目录下没有
| @@ -0,0 +1,21 @@ | |||
| instances: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个没用?
| # Test splitwise deployment | ||
| # There are two methods for splitwise deployment: | ||
| # v0: using splitwise_scheduler or dp_scheduler | ||
| # v1: using local_scheduler + router |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以区分下 v2是使用go版本router
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体意见
如果 golang_router 计划作为 对外发布组件,当前 fastdeploy/golang_router/README_CN.md 的编译、运行和版本说明尚 不满足 FastDeploy 的对外发版规范。在补齐以下关键内容前,不建议对外发布或对用户宣称可用。
1. 缺失官方发版形态定义
README 当前默认用户通过:
go run cmd/main.go
./build.sh
来使用组件,但并未说明 FastDeploy 官方支持的交付形态。
需要明确至少以下之一,并写入 README:
- 编译产物的最终形态是什么
- 是否提供 官方二进制发布包
- 是否提供 官方 Docker 镜像
- 是否仅作为 FastDeploy Docker 镜像内置组件 使用
- 是否存在 版本化产物下载入口
2. 编译环境与可复现性不足
当前 README 中对 Go 侧编译仅要求:
Go 1.21 或更高版本
但作为 FastDeploy 对外发布组件,需要满足可复现构建的基本要求,目前尚不清楚:
- 是否有推荐或锁定的 Go 版本区间
- 构建是否依赖特定系统环境(glibc / OS)
- 是否可在 FastDeploy 官方 Docker 环境中构建
3. 与 FastDeploy 主体(Python / GPU 运行时)关系未定义
当前 README 将 golang_router 作为一个相对独立的 Go 服务介绍,但未说明:
- 是否 依赖 FastDeploy Python Wheel
- 是否与 FastDeploy GPU 推理进程存在绑定关系
- 通信方式(HTTP / RPC / gRPC)
对外发布组件需要明确其 可独立运行的前置条件。
4. 对外配置项未区分稳定性等级
README 将全部配置项作为对外接口暴露,但未区分:
- 稳定可长期支持的配置项
- 实验性或内部使用的调度 / 策略配置
- 未来可能发生不兼容变更的参数
5. 版本与兼容性策略缺失
README 中未说明:
- golang_router 是否拥有独立版本号
- 是否与 FastDeploy 主版本号强绑定
- 是否兼容变更的升级策略
总结建议
当前 README 更接近 内部使用或独立项目说明,尚不满足 FastDeploy 官方对外发布组件的文档完整性要求。
建议在对外发布前补充:
- 官方交付与使用方式
- 构建与运行的标准路径
- 与 FastDeploy 主体的依赖关系
- 清晰的版本与兼容性策略
在上述信息明确前,不建议以 FastDeploy 官方组件身份对外发布。
Motivation
提供golang版本高性能router
Modifications
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.