-
Notifications
You must be signed in to change notification settings - Fork 681
Description
设备信息
16✖天数 Iluvatar BI-V150显卡
+-----------------------------------------------------------------------------+
| IX-ML: 4.3.8 Driver Version: 4.3.0 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------|
| GPU Name | Bus-Id | Clock-SM Clock-Mem |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Iluvatar BI-V150 | 00000000:45:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 N/A / N/A | 12734MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 1 Iluvatar BI-V150 | 00000000:48:00.0 | 1600MHz 1600MHz |
| N/A 34C P0 117W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 2 Iluvatar BI-V150 | 00000000:4E:00.0 | 1600MHz 1600MHz |
| N/A 31C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 3 Iluvatar BI-V150 | 00000000:51:00.0 | 1600MHz 1600MHz |
| N/A 34C P0 114W / 350W | 12990MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 4 Iluvatar BI-V150 | 00000000:5B:00.0 | 1600MHz 1600MHz |
| N/A 32C P0 N/A / N/A | 12738MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 5 Iluvatar BI-V150 | 00000000:5E:00.0 | 1600MHz 1600MHz |
| N/A 34C P0 114W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 6 Iluvatar BI-V150 | 00000000:66:00.0 | 1600MHz 1600MHz |
| N/A 33C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 7 Iluvatar BI-V150 | 00000000:69:00.0 | 1600MHz 1600MHz |
| N/A 33C P0 112W / 350W | 12990MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 8 Iluvatar BI-V150 | 00000000:73:00.0 | 1600MHz 1600MHz |
| N/A 32C P0 N/A / N/A | 12738MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 9 Iluvatar BI-V150 | 00000000:76:00.0 | 1600MHz 1600MHz |
| N/A 32C P0 115W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 10 Iluvatar BI-V150 | 00000000:81:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 11 Iluvatar BI-V150 | 00000000:84:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 113W / 350W | 12990MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 12 Iluvatar BI-V150 | 00000000:8C:00.0 | 1600MHz 1600MHz |
| N/A 37C P0 N/A / N/A | 12738MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 13 Iluvatar BI-V150 | 00000000:8F:00.0 | 1600MHz 1600MHz |
| N/A 37C P0 116W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 14 Iluvatar BI-V150 | 00000000:95:00.0 | 1600MHz 1600MHz |
| N/A 35C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 15 Iluvatar BI-V150 | 00000000:98:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 116W / 350W | 12734MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Process name Usage(MiB) |
|=============================================================================|
| 0 3314328 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 1 3314329 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 2 3314330 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 3 3314331 /usr/local/bin/python -u /usr/local/lib... 12914 |
| 4 3314332 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 5 3314333 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 6 3314334 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 7 3314337 /usr/local/bin/python -u /usr/local/lib... 12914 |
| 8 3314340 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 9 3314343 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 10 3314346 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 11 3314349 /usr/local/bin/python -u /usr/local/lib... 12914 |
| 12 3314352 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 13 3314355 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 14 3314358 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 15 3314363 /usr/local/bin/python -u /usr/local/lib... 12658 |
+-----------------------------------------------------------------------------+
问题
使用下列命令部署模型
export PADDLE_XCCL_BACKEND=iluvatar_gpu
export INFERENCE_MSG_QUEUE_ID=232132
export LD_PRELOAD=/usr/local/corex/lib64/libcuda.so.1
export FD_SAMPLING_CLASS=rejection
export FD_DEBUG=1
export ENABLE_V1_KVCACHE_SCHEDULER=1
python -m fastdeploy.entrypoints.openai.api_server
--model ZhipuAI/GLM-4.5-Air
--tensor-parallel-size 16
--port 8185
--block-size 16
--quantization wfp8afp8
--swap-space 50
模型看起来来成功部署👇
-swap-space 50/usr/local/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:718: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)
WARNING 2025-12-11 17:05:12,778 3314133 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,778] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
WARNING 2025-12-11 17:05:12,951 3314133 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,951] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,981] [ DEBUG] utils.py:35 - No plugins for group fastdeploy.reasoning_parser_plugins found.
[2025-12-11 17:05:13,139] [ DEBUG] utils.py:35 - No plugins for group fastdeploy.token_processor_plugins found.
INFO 2025-12-11 17:05:13,466 3314133 api_server.py[line:80] Number of api-server workers: 1.
/usr/local/corex-4.3.8/lib64/python3/dist-packages/torch/cuda/init.py:58: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
Downloading Model from https://www.modelscope.cn to directory: /data/projects/modelscope/ZhipuAI/GLM-4.5-Air
2025-12-11 17:05:16,016 - modelscope - INFO - Target directory already exists, skipping creation.
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:05:16,027] [ INFO] - Using download source: huggingface
[2025-12-11 17:05:16,027] [ INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:05:16,027] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(
/usr/local/lib/python3.10/site-packages/paddle/jit/sot/opcode_translator/skip_files.py:105: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
import distutils
/usr/local/lib/python3.10/site-packages/fastdeploy/logger/logger.py:190: ResourceWarning: unclosed file <_io.BufferedWriter name='log/cudagraph_piecewise_backend.log.2025-12-11'>
for handler in logger.handlers[:]:
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:05:16,215] [ WARNING] - import noaux_tc Failed!
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
:241: DeprecationWarning: builtin type SwigPyPacked has no module attribute
:241: DeprecationWarning: builtin type SwigPyObject has no module attribute
[2025-12-11 17:05:17,166] [ INFO] - Using download source: huggingface
[2025-12-11 17:05:17,788] [ INFO] - Using download source: huggingface
INFO 2025-12-11 17:05:18,790 3314133 engine.py[line:146] Waiting for worker processes to be ready...
Loading Weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:51<00:00, 1.94it/s]
Loading Layers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:07<00:00, 14.27it/s]
INFO 2025-12-11 17:06:24,861 3314133 engine.py[line:197] Worker processes are launched with 68.54115271568298 seconds.
INFO 2025-12-11 17:06:24,861 3314133 api_server.py[line:729] Launching metrics service at http://0.0.0.0:8185/metrics
INFO 2025-12-11 17:06:24,861 3314133 api_server.py[line:730] Launching chat completion service at http://0.0.0.0:8185/v1/chat/completions
INFO 2025-12-11 17:06:24,861 3314133 api_server.py[line:731] Launching completion service at http://0.0.0.0:8185/v1/completions
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Starting gunicorn 23.0.0
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Listening at: http://0.0.0.0:8185 (3314133)
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Booting worker with pid: 4089232
/usr/local/lib/python3.10/site-packages/websockets/legacy/init.py:6: DeprecationWarning: websockets.legacy is deprecated; see https://websockets.readthedocs.io/en/stable/howto/upgrade.html for upgrade instructions
warnings.warn( # deprecated in 14.0 - 2024-11-09
/usr/local/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py:14: DeprecationWarning: websockets.server.WebSocketServerProtocol is deprecated
from websockets.server import WebSocketServerProtocol
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Started server process [4089232]
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Waiting for application startup.
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,926] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,926] [ INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:06:25,926] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,954] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,955] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:26 +0800] [4089232] [INFO] Application startup complete.
尝试发起调用时报错
curl -X POST "http://0.0.0.0:8185/v1/chat/completions"
-H "Content-Type: application/json"
-d '{
"messages": [
{"role": "user", "content": "什么是集成电路?"}
]
}'
得到 curl: (52) Empty reply from server
后台显示
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,926] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,926] [ INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:06:25,926] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,954] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,955] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:26 +0800] [4089232] [INFO] Application startup complete.
/usr/local/lib/python3.10/site-packages/fastdeploy/entrypoints/openai/protocol.py:692: DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field
req_dict["max_tokens"] = self.max_completion_tokens or self.max_tokens
/usr/local/lib/python3.10/site-packages/fastdeploy/entrypoints/openai/protocol.py:708: PydanticDeprecatedSince20: The dict method is deprecated; use model_dump instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
for key, value in self.dict().items():
[2025-12-11 17:18:13 +0800] [3314133] [ERROR] Worker (pid:3314315) exited with code 1
[2025-12-11 17:18:13 +0800] [3314133] [ERROR] Worker (pid:3314315) exited with code 1.
ERROR 2025-12-11 17:18:15,407 3314133 api_server.py[line:704] Worker process has died in the background (code=0). API server is forced to stop.
[2025-12-11 17:18:15 +0800] [3314133] [INFO] Handling signal: int
[2025-12-11 17:18:15 +0800] [3314133] [INFO] Shutting down: Master
ERROR 2025-12-11 17:18:15,412 3314133 engine.py[line:435] Error extracting sub services: [Errno 3] No such process, Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/fastdeploy/engine/engine.py", line 432, in _exit_sub_services
pgid = os.getpgid(self.worker_proc.pid)
ProcessLookupError: [Errno 3] No such process
[2025-12-11 17:18:15 +0800] [3314133] [ERROR] Worker (pid:4089232) was sent SIGKILL! Perhaps out of memory?
sys:1: DeprecationWarning: builtin type swigvarlink has no module attribute
sys:1: ResourceWarning: unclosed file <_io.BufferedReader name=80>
需要帮助
是当前天数显卡还不支持使用该模型吗?或者需要额外的环境变量/启动参数?谢谢!