From 14472dc33ba601a5c687e78a08df4239a2fa851b Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 14:07:00 +0100 Subject: [PATCH 1/8] Add Ubuntu quick start section to README Added quick start instructions for Ubuntu installation. Signed-off-by: Roberto A. Foglietta --- README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/README.md b/README.md index 798c0e951..7de650489 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,25 @@ This project is based on the [llama.cpp](https://github.com/ggerganov/llama.cpp) ## Installation +### Ubuntu quck start + +``` +sudo apt install ccache clang libomp-dev + +git clone --recursive https://github.com/microsoft/BitNet.git +cd BitNet/ + +mkdir -p models/BitNet-b1.58-2B-4T/ +link="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf" +wget -c "$link" -O models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf + +python3 setup_env.py -md models/BitNet-b1.58-2B-4T/ -q i2_s + +sysprompt="You are a helpful assistant" +python3 run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "$sysprompt" -cnv --temp 0.3 + +``` + ### Requirements - python>=3.9 - cmake>=3.22 From 961b02d32d7306d5de5dbd55625f729bce0265ef Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 14:10:30 +0100 Subject: [PATCH 2/8] Update model download link and inference command Keep the text inside a 80-cols lile code window Signed-off-by: Roberto A. Foglietta --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7de650489..fefb28730 100644 --- a/README.md +++ b/README.md @@ -166,13 +166,15 @@ git clone --recursive https://github.com/microsoft/BitNet.git cd BitNet/ mkdir -p models/BitNet-b1.58-2B-4T/ -link="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf" +url="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf" +link="$url/resolve/main/ggml-model-i2_s.gguf" wget -c "$link" -O models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf python3 setup_env.py -md models/BitNet-b1.58-2B-4T/ -q i2_s sysprompt="You are a helpful assistant" -python3 run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "$sysprompt" -cnv --temp 0.3 +model="models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf" +python3 run_inference.py -m $model -p "$sysprompt" -cnv --temp 0.3 ``` From 3ee53d1a5802151a8bc1ba216a473ff0adb0f923 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 14:19:15 +0100 Subject: [PATCH 3/8] Update model download and setup instructions in README Strong use of viariable for bash code flexibility Signed-off-by: Roberto A. Foglietta --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index fefb28730..bf091ece4 100644 --- a/README.md +++ b/README.md @@ -165,16 +165,17 @@ sudo apt install ccache clang libomp-dev git clone --recursive https://github.com/microsoft/BitNet.git cd BitNet/ -mkdir -p models/BitNet-b1.58-2B-4T/ +gguf="ggml-model-i2_s.gguf" +mdir="models/BitNet-b1.58-2B-4T/" url="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf" -link="$url/resolve/main/ggml-model-i2_s.gguf" -wget -c "$link" -O models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf +link="$url/resolve/main/$gguf" +modprm="$mdir/$gguf" +mkdir -p $mdir && wget -c "$link" -O $modprm -python3 setup_env.py -md models/BitNet-b1.58-2B-4T/ -q i2_s +python3 setup_env.py -md $mdir -q i2_s sysprompt="You are a helpful assistant" -model="models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf" -python3 run_inference.py -m $model -p "$sysprompt" -cnv --temp 0.3 +python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 ``` From fe9bf30171cc87e0c395986545f1ca3f1d1ad6df Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 17:05:48 +0100 Subject: [PATCH 4/8] Update Ubuntu quick start instructions in README Added swapoff command and updated model directory path in Ubuntu quick start instructions. Included additional options for running inference with multiple threads. Signed-off-by: Roberto A. Foglietta --- README.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index bf091ece4..e510e2b54 100644 --- a/README.md +++ b/README.md @@ -161,21 +161,27 @@ This project is based on the [llama.cpp](https://github.com/ggerganov/llama.cpp) ``` sudo apt install ccache clang libomp-dev +sudo swapoff -a git clone --recursive https://github.com/microsoft/BitNet.git cd BitNet/ gguf="ggml-model-i2_s.gguf" -mdir="models/BitNet-b1.58-2B-4T/" +mdir="models/BitNet-b1.58-2B-4T" url="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf" link="$url/resolve/main/$gguf" modprm="$mdir/$gguf" -mkdir -p $mdir && wget -c "$link" -O $modprm +mkdir -p $mdir && wget -c "$link" -O $modprm +pip install -r requirements.txt python3 setup_env.py -md $mdir -q i2_s +cmake --build build --config Release sysprompt="You are a helpful assistant" -python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 +python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) +# Alternative with a file prompt +pretkns="--override-kv tokenizer.ggml.pre=str:llama3" +llama-cli -m $modprm -f ${file_prompt} -cnv --temp 0.3 -t $(nproc) $pretkns ``` From c8e95a5a446ed7c8fcc269dbec47def72e4c9538 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 19:19:09 +0100 Subject: [PATCH 5/8] Enhance README with file prompt usage details Updated instructions for running inference with file prompts and specific parameters. Signed-off-by: Roberto A. Foglietta --- README.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e510e2b54..6703abbe6 100644 --- a/README.md +++ b/README.md @@ -177,11 +177,16 @@ pip install -r requirements.txt python3 setup_env.py -md $mdir -q i2_s cmake --build build --config Release +export PATH="$PATH:$PWD/build/bin/" sysprompt="You are a helpful assistant" python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) -# Alternative with a file prompt + +# Alternative with a file prompt and sepcific parameters +tempr="--temp 0.3 --dynatemp-range 0.1" +file_prompt=${file_prompt:-/dev/null -p '$sysprompt'} pretkns="--override-kv tokenizer.ggml.pre=str:llama3" -llama-cli -m $modprm -f ${file_prompt} -cnv --temp 0.3 -t $(nproc) $pretkns +intcnv="-i --multiline-input -cnv -c 4096 -b 2048" +llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv ``` From 655761974cc26af82e5613e1585cf9d824a239b1 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 20:33:05 +0100 Subject: [PATCH 6/8] Upgrade pip and install requirements with filtering Upgrade pip before installing requirements and filter output. Signed-off-by: Roberto A. Foglietta --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6703abbe6..ce9f1ad10 100644 --- a/README.md +++ b/README.md @@ -173,7 +173,8 @@ link="$url/resolve/main/$gguf" modprm="$mdir/$gguf" mkdir -p $mdir && wget -c "$link" -O $modprm -pip install -r requirements.txt +{ python3 -m pip install --upgrade pip; pip install -r requirements.txt; }\ + | grep -ve "^Requirement already satisfied:" python3 setup_env.py -md $mdir -q i2_s cmake --build build --config Release From 93d32254598756ad7e3742983603e53c94146310 Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Sun, 11 Jan 2026 20:51:49 +0100 Subject: [PATCH 7/8] Modify inference command parameters in README Updated parameters for inference command in README. Signed-off-by: Roberto A. Foglietta --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ce9f1ad10..6085f2c35 100644 --- a/README.md +++ b/README.md @@ -183,10 +183,10 @@ sysprompt="You are a helpful assistant" python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) # Alternative with a file prompt and sepcific parameters -tempr="--temp 0.3 --dynatemp-range 0.1" +tempr="--temp 0.3 --dynatemp-range 0.1 --no-warmup" file_prompt=${file_prompt:-/dev/null -p '$sysprompt'} pretkns="--override-kv tokenizer.ggml.pre=str:llama3" -intcnv="-i --multiline-input -cnv -c 4096 -b 2048" +intcnv="-i --multiline-input -cnv -c 8192 -b 4096" llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv ``` From 71e90cf28b3ed186383855b4e2d5f571b4cd29ec Mon Sep 17 00:00:00 2001 From: "Roberto A. Foglietta" Date: Mon, 12 Jan 2026 00:25:58 +0100 Subject: [PATCH 8/8] Update tokenizer and input parameters in README Some more parameters added to the alternative llama start command Signed-off-by: Roberto A. Foglietta --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6085f2c35..ee9babde6 100644 --- a/README.md +++ b/README.md @@ -185,8 +185,8 @@ python3 run_inference.py -m $modprm -p "$sysprompt" -cnv --temp 0.3 -t $(nproc) # Alternative with a file prompt and sepcific parameters tempr="--temp 0.3 --dynatemp-range 0.1 --no-warmup" file_prompt=${file_prompt:-/dev/null -p '$sysprompt'} -pretkns="--override-kv tokenizer.ggml.pre=str:llama3" -intcnv="-i --multiline-input -cnv -c 8192 -b 4096" +pretkns="--override-kv tokenizer.ggml.pre=str:llama3 --mlock" +intcnv="-i --multiline-input -cnv -c 8192 -b 4096 -co --keep -1 -n -1" llama-cli -m $modprm -f ${file_prompt} -t $(nproc) $pretkns $tempr $intcnv ```