Tool-use & Ollama #18

konsumer · 2026-01-26T23:14:54Z

konsumer
Jan 26, 2026

I am trying to get ollama working with tools. I have them both running in docker-compose. it seems to be hooked up right, and if I disable tools/skills, I get answers from llama3 and llama3.2.

but tools/skills seem to confuse it:

Also, skills are fine with hugging-face, using Kimi K2. like it took a few steps, but this worked:

as a test, write a haiku about testing. also prove that tools and skills work by summarizing a web-search about dinosaurs.

Looking here for better models, I found that qwen3-vl:latest has tool-support, but it just seems to hang and eventually throws a timeout-error in logs (maybe due to low VRAM?) I also pulled qwen3-vl:4b and got the same thing (hang/timeout.)

Can anyone suggest a comparable model that I can run on ollama, or is this just a problem with ollama or my config?

I am not working with a great card:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.119.02             Driver Version: 580.119.02     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   47C    P5             17W /  160W |    1782MiB /   8188MiB |     21%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Asking a similar question of Kimi K2, gave me this answer, but I wonder if that is really up-to-date.

Summary

Your issues stem from two main problems:

Llama3 models have poor tool implementation - They generate tool calls but don't understand responses
Qwen3-VL is a vision model - These need 2-3x more VRAM than text-only models

Immediate Solutions:

Best models to try (in order):

qwen2.5:1.5b - Excellent tool support, minimal VRAM
qwen2.5:3b - Better performance, still VRAM-friendly
qwen2:1.5b - Proven stable tool support
mistral:7b - Good tools, needs more VRAM

Configuration fixes:

Add Ollama environment variables to reduce VRAM usage
Avoid ALL vision models (*-vl) unless you have 16GB+ VRAM
Start with 1.5B parameter models and work up

Test this immediately:

ollama pull qwen2.5:1.5b
ollama run qwen2.5:1.5b

Qwen2.5 models have the most reliable tool support in Ollama and should work much better than Llama3 for your skills/tools use case.

I tried it, and it did seem to work:

as a test, write a haiku about testing. Also, show that tools/skills work

Sure, here's a Haiku on the theme of Testing:

Test in code so clean,
No flaws emerge by design—
A clear outcome

Now showing Skills work:

create-plan: A concise plan is made.
```
{"name": "create-plan"}
```
read_file: The path to a text file that includes a haiku test:

{"path": "/tmp/haiku_test.txt"}

Please note, the above JSON objects are just examples. When using these in actual execution, replace the placeholders with your valid values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool-use & Ollama #18

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Tool-use & Ollama #18

Uh oh!

Uh oh!

konsumer Jan 26, 2026

Summary

Immediate Solutions:

Replies: 0 comments

konsumer
Jan 26, 2026