5.llama.cpp编译及使用 - AIGC

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

贪玩的上铺 · Windows ...· 4 月前 ·

有爱心的花卷 · 新竹物流-營業據點一覽· 4 月前 ·

微笑的花生 · 日本全家状告顶新，中国的全家还能叫“全家”吗 ...· 5 月前 ·

非常酷的红薯 · python清空excel内容 - CSDN文库· 6 月前 ·

火爆的手术刀 · 经济说-建行大山白金卡（龙卡尊享白金）权益及 ...· 7 月前 ·

模型量化的python代码在llama.cpp下面找到。在硬件资源有限的情况下才对模型进行量化。
在build/bin找到quantize

https://huggingface.co/meta-llama/Llama-2-7b-hf
模型转换
 convert the 7B model to ggml FP16 format 默认做当前目录下生成ggml模型ggml-model-f16.bin
python convert.py models/llama-2-7b-hf/ 
在较新版本默认生成的是ggml-model-f16.gguf 
模型量化
 quantize the model to 4-bits (using q4_0 method) 进一步对FP16模型进行4-bit量化
./quantize ./models/llama-2-7b-hf/ggml-model-f16.bin ./models/llama-2-7b-hf/ggml-model-q4_0.bin q4_0
在build/bin找到main 
./main -ngl 30 -m ./models/llama-2-7b-hf/ggml-model-q4_0.bin --color -f  ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0
Linly模型 
自己动手处理 
测试用脚本
#!/bin/bash
# llama 推理
#./main -ngl 30 -m ./models/7B/ggml-model-alpaca-7b-q4_0.gguf --color  -f  ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.3
# linly 基础模型
#./main -ngl 30 -m ./models/7B/linly-ggml-model-q4_0.bin --color  -f  ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0
# linly chatflow模型
./main -ngl 30 -m ./models/chatflow_7b/linly-chatflow-7b-q4_0.bin --color  -f  ./prompts/chat-with-vicuna-v0.txt -ins -c 2048 --temp 0.2 -n 4096 --repeat_penalty 1.0
# whisper llama
#./whisper/talk-llama -l zh -mw ./models/ggml-small_q4_0.bin -ml ./models/7B/ggml-model-alpaca-7b-q4_0.gguf -p "lfrobot" -t 8 -c 0 -vth 0.6 -fth 100 -pe
参数说明
 比较重要的参数：
-ins    启动类ChatGPT的对话交流模式
-f      指定prompt模板，alpaca模型请加载prompts/alpaca.txt 指令模板
-c      控制上下文的长度，值越大越能参考更长的对话历史（默认：512）
-n      控制回复生成的最大长度（默认：128）
--repeat_penalty 控制生成回复中对重复文本的惩罚力度
--temp  温度系数，值越低回复的随机性越小，反之越大
--top_p, top_k  控制解码采样的相关参数
-b      控制batch size（默认：512）
-t      控制线程数量（默认：8），可适当增加
-ngl    使用cuda核心数
-m      指定模型
	          
llamachatpromptpromptsalpacahuggingfacegithubflowgitpythonwhispergan对话交流python代码模型推理基础模型chatgpt控制生成gptbot 
      更新时间 2024-06-06