Skip to content

Conversation

@Smilencelsy
Copy link

Motivation

提供golang版本高性能router

Modifications

  • 提供golang开源代码
  • 提供编译、运行脚本以及说明

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


longsiyi seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Dec 24, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a high-performance Golang router implementation for FastDeploy, providing flexible request routing with support for both disaggregated (splitwise) and mixed deployment modes.

Key changes:

  • Implements a complete Golang-based HTTP router with middleware support (CORS, logging, recovery, metrics)
  • Adds multiple scheduling strategies (random, round_robin, power_of_two, process_tokens, request_num) for load balancing
  • Provides health monitoring and automatic worker registration/deregistration
  • Includes comprehensive unit tests with good coverage

Reviewed changes

Copilot reviewed 44 out of 45 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
cmd/main.go Main entry point that initializes config, logger, manager, scheduler, and starts the HTTP server
internal/config/config.go Configuration loading with YAML support and default value handling
internal/gateway/completions.go Core routing logic for chat completions, handles P/D coordination and request forwarding
internal/manager/*.go Worker registration, health checks, and instance management
internal/scheduler/handler/*.go Multiple scheduling algorithms for worker selection
internal/middleware/*.go HTTP middleware for CORS, logging, recovery, and metrics collection
internal/router/router.go Gin router setup with API routes and middleware configuration
pkg/logger/logger.go Custom logging implementation with level-based filtering
pkg/metrics/metrics.go Prometheus metrics collection for monitoring
run.sh, build.sh, Makefile Build and deployment scripts
README.md, README_CN.md Documentation in English and Chinese

}

if err := scanner.Err(); err != nil {
fmt.Println("scanner error ", err)
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fmt.Println call for error handling is inconsistent with the rest of the codebase which uses the logger package. This error should be logged using logger.Error for consistency and proper log management.

Suggested change
fmt.Println("scanner error ", err)
logger.Error("scanner error ", err)

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,108 @@
# fd-router
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title should follow the format specified in the checklist. According to the PR description checklist, it should be "[Feature] Add golang router" instead of "[Feature] add golang router" (capitalize "Add").

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +17 to +43
func Init(logLevel, output string) {
level = logLevel

flags := log.LstdFlags | log.Lshortfile

if output == "file" {
// Check if logs directory exists
if _, err := os.Stat("logs"); os.IsNotExist(err) {
if err := os.MkdirAll("logs", 0755); err != nil {
log.Fatalln("Failed to create logs directory:", err)
}
}
file, err := os.OpenFile("logs/router.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
if err != nil {
log.Fatalln("Failed to open log file:", err)
}
infoLogger = log.New(file, "[INFO] ", flags)
errorLogger = log.New(file, "[ERROR] ", flags)
warnLogger = log.New(file, "[WARN] ", flags)
debugLogger = log.New(file, "[DEBUG] ", flags)
} else {
infoLogger = log.New(os.Stdout, "[INFO] ", flags)
errorLogger = log.New(os.Stderr, "[ERROR] ", flags)
warnLogger = log.New(os.Stdout, "[WARN] ", flags)
debugLogger = log.New(os.Stdout, "[DEBUG] ", flags)
}
}
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger initialization uses global package-level variables that are not safe for concurrent access during initialization. If multiple goroutines call Init concurrently, it could lead to race conditions. Consider using sync.Once to ensure Init is called only once, or document that Init must be called before any concurrent operations.

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +39
c.Writer.Header().Set("Access-Control-Allow-Origin", "*")
c.Writer.Header().Set("Access-Control-Allow-Credentials", "true")
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CORS middleware allows all origins with "" which is a security risk in production. When Access-Control-Allow-Credentials is set to "true", the origin should not be "" according to CORS specifications. This configuration will be rejected by modern browsers. Either remove credentials support or restrict allowed origins to specific domains.

Copilot uses AI. Check for mistakes.
}
}

client := &http.Client{}
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP client is created without any timeout configuration. This means requests can hang indefinitely if the backend service doesn't respond. Set appropriate timeouts using client.Timeout to prevent resource exhaustion and improve reliability.

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +17
source := rand.NewSource(time.Now().UnixNano())
r := rand.New(source)
randomNum := r.Intn(len(workers))
return workers[randomNum], nil
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new random source with rand.NewSource(time.Now().UnixNano()) in each call is inefficient and can produce predictable results when called in rapid succession with the same nanosecond timestamp. Consider using a package-level properly seeded random generator or the global rand functions which are already seeded.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +23
source := rand.NewSource(time.Now().UnixNano())
r := rand.New(source)
length := len(workers)
randomNum1 := r.Intn(length)
randomNum2 := r.Intn(length)
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same performance issue exists here: creating a new random source on every call is inefficient. Additionally, creating a new random source with the same timestamp in PowerOfTwoSelectWorker (lines 19-20) can cause both random numbers to be identical or very similar when called multiple times rapidly, reducing the effectiveness of the power-of-two selection algorithm.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +32
file, err := os.OpenFile("logs/router.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
if err != nil {
log.Fatalln("Failed to open log file:", err)
}
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential resource leak if the logger file fails to open. The function calls log.Fatalln which will exit the program, but if the file was successfully created, it won't be properly closed. Consider using defer or handling the error more gracefully to allow proper cleanup.

Copilot uses AI. Check for mistakes.

func SelectWorker(ctx context.Context, message string) (string, error) {
workers := WorkerMapToList(ctx, "mixed")
selecteWorkerURL, err := scheduler_handler.SelectWorker(ctx, workers, message, "mixed")
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name: selecteWorkerURL should be selectedWorkerURL (missing 'd').

Copilot uses AI. Check for mistakes.
Comment on lines +128 to +132
selectePrefillWorkerURL, err := scheduler_handler.SelectWorker(ctx, prefillWorkers, message, "prefill")
if err != nil {
return "", "", err
}
selecteDecodeWorkerURL, err := scheduler_handler.SelectWorker(ctx, decodeWorkers, message, "decode")
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name: selectePrefillWorkerURL and selecteDecodeWorkerURL should be selectedPrefillWorkerURL and selectedDecodeWorkerURL (missing 'd').

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,111 @@
# fd-router

一个高性能的 Go 语言路由框架,提供灵活的请求路由、中间件支持和健康检查功能。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以说明下正在开发迭代中

1. 复制配置文件模板并修改:

```bash
cp config/config.example.yaml config/config.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config目录在examples下面吧, 这层级目录下没有

@@ -0,0 +1,21 @@
instances:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没用?

# Test splitwise deployment
# There are two methods for splitwise deployment:
# v0: using splitwise_scheduler or dp_scheduler
# v1: using local_scheduler + router
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以区分下 v2是使用go版本router

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体意见

如果 golang_router 计划作为 对外发布组件,当前 fastdeploy/golang_router/README_CN.md 的编译、运行和版本说明尚 不满足 FastDeploy 的对外发版规范。在补齐以下关键内容前,不建议对外发布或对用户宣称可用。


1. 缺失官方发版形态定义

README 当前默认用户通过:

go run cmd/main.go
./build.sh

来使用组件,但并未说明 ​FastDeploy 官方支持的交付形态​。

需要明确至少以下之一,并写入 README:

  • 编译产物的最终形态是什么
  • 是否提供 官方二进制发布包
  • 是否提供 官方 Docker 镜像
  • 是否仅作为 FastDeploy Docker 镜像内置组件 使用
  • 是否存在 版本化产物下载入口

2. 编译环境与可复现性不足

当前 README 中对 Go 侧编译仅要求:

Go 1.21 或更高版本

但作为 FastDeploy 对外发布组件,需要满足可复现构建的基本要求,目前尚不清楚:

  • 是否有推荐或锁定的 Go 版本区间
  • 构建是否依赖特定系统环境(glibc / OS)
  • 是否可在 FastDeploy 官方 Docker 环境中构建

3. 与 FastDeploy 主体(Python / GPU 运行时)关系未定义

当前 README 将 golang_router 作为一个相对独立的 Go 服务介绍,但未说明:

  • 是否 依赖 FastDeploy Python Wheel
  • 是否与 FastDeploy GPU 推理进程存在绑定关系
  • 通信方式(HTTP / RPC / gRPC)

对外发布组件需要明确其 可独立运行的前置条件


4. 对外配置项未区分稳定性等级

README 将全部配置项作为对外接口暴露,但未区分:

  • 稳定可长期支持的配置项
  • 实验性或内部使用的调度 / 策略配置
  • 未来可能发生不兼容变更的参数

5. 版本与兼容性策略缺失

README 中未说明:

  • golang_router 是否拥有独立版本号
  • 是否与 FastDeploy 主版本号强绑定
  • 是否兼容变更的升级策略

总结建议

当前 README 更接近 内部使用或独立项目说明,尚不满足 FastDeploy 官方对外发布组件的文档完整性要求。

建议在对外发布前补充:

  1. 官方交付与使用方式
  2. 构建与运行的标准路径
  3. 与 FastDeploy 主体的依赖关系
  4. 清晰的版本与兼容性策略

在上述信息明确前,不建议以 FastDeploy 官方组件身份对外发布。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants