Hindsight model errors #44
Owner
```
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
0.00.748.647 I device_info:
0.00.748.827 I - SYCL0 : Intel(R) UHD Graphics 750 (28851 MiB, 28851 MiB free)
0.00.748.846 I - CPU : 11th Gen Intel(R) Core(TM) i9-11900 @ 2.50GHz (31171 MiB, 31171 MiB free)
0.00.748.959 I system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.748.971 I srv llama_server: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
0.00.749.152 I srv init: running without SSL
0.00.749.219 I srv init: using 15 threads for HTTP server
0.00.749.534 I srv start: binding port with default address family
0.00.750.756 I srv llama_server: loading model
0.00.750.767 I srv load_model: loading model '/root/.cache/huggingface/hub/models--bartowski--google_gemma-4-E2B-it-GGUF/snapshots/b5e99bd964eaacc27ba484bb2eb3e9f6160b9143/google_gemma-4-E2B-it-Q4_K_M.gguf'
0.01.200.774 I srv load_model: [mtmd] estimated worst-case memory usage of mmproj is 1200.06 MiB
0.01.200.793 I common_init_result: fitting params to device memory ...
0.01.200.793 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on)
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
0.01.979.826 W load: control-looking token: 212 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
0.01.980.182 W load: control-looking token: 50 '<|tool_response>' was not control-type; this is probably a bug in the model. its type will be overridden
0.01.998.024 W load: special_eog_ids contains '<|tool_response>', removing '</s>' token from EOG list
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
0.02.589.936 W llama_context: n_ctx_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
0.02.610.187 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
0.03.861.327 W init_audio: audio input is in experimental stage and may have reduced quality:
https://github.com/ggml-org/llama.cpp/discussions/13759
0.03.861.382 I srv load_model: loaded multimodal model, '/root/.cache/huggingface/hub/models--bartowski--google_gemma-4-E2B-it-GGUF/snapshots/b5e99bd964eaacc27ba484bb2eb3e9f6160b9143/mmproj-google_gemma-4-E2B-it-bf16.gguf'
0.03.861.392 I srv load_model: initializing slots, n_slots = 4
0.04.216.377 W common_speculative_init: no implementations specified for speculative decoding
0.04.216.387 I slot load_model: id 0 | task -1 | new slot, n_ctx = 16384
0.04.216.392 I slot load_model: id 1 | task -1 | new slot, n_ctx = 16384
0.04.216.393 I slot load_model: id 2 | task -1 | new slot, n_ctx = 16384
0.04.216.393 I slot load_model: id 3 | task -1 | new slot, n_ctx = 16384
0.04.216.463 I srv load_model: prompt cache is enabled, size limit: 8192 MiB
0.04.216.464 I srv load_model: use `--cache-ram 0` to disable the prompt cache
0.04.216.464 I srv load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
0.04.216.465 I srv load_model: context checkpoints enabled, max = 32, min spacing = 256
0.04.216.481 I srv init: idle slots will be saved to prompt cache and cleared upon starting a new task
0.04.222.252 I init: chat template, example_format: '<|turn>system
>
You are a helpful assistant<turn|>
<|turn>user
Hello<turn|>
<|turn>model
Hi there<turn|>
<|turn>user
How are you?<turn|>
<|turn>model
'
0.04.222.833 I srv init: init: chat template, thinking = 1
0.04.222.852 I srv llama_server: model loaded
0.04.222.855 I srv llama_server: server is listening on http://0.0.0.0:8080
0.04.222.858 I srv update_slots: all slots are idle
```
Labels
No labels
ai
bug
code-server
duplicate
enhancement
forgejo
gitea
help wanted
invalid
metadata
network
oracle
question
upstream
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
apb/infrastructure#44
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?