TinyLlama
TinyLlama
TinyLlama
Mac下高速跑LLM
TinyLlama ultra fast on M3 Max cores: 4E+12P+40GPU with
-
Q4_0: 207 tokens/s
-
Q5_K_M: 197 tokens/s
-
FP16: 119 tokens/s
https://ollama.ai/library/tinyllama?continueFlag=ee66df50d8b2c452419ecff089efadc7
https://github.com/jzhang38/TinyLlama
https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6




