Inference for jac-vision is currently tied to unsloth, which limits the models we can use.
We can continue to use unsloth as the fine-tuning framework of choice.
However, I think we should integrate jac-vision with a more generalized framework for model inference.
Options that I am aware of are:
- vLLM
- Ollama
- Huggingface/Transformers
After a quick comparison, I am leaning towards vLLM because it is more performant and scalable than Ollama and it takes cares of the different implementation for models that will be required in a direct integration with transformers.
I'd like someone on the team to do a quick research on the landscape of model inference framework and confirm that vLLM is the right choice here.