TinyLlama
TinyLlama
TinyLlama
Mac下高速跑LLM
TinyLlama ultra fast on M3 Max cores: 4E+12P+40GPU with
-
Q4_0: 207 tokens/s
-
Q5_K_M: 197 tokens/s
-
FP16: 119 tokens/s
https://ollama.ai/library/tinyllama?continueFlag=ee66df50d8b2c452419ecff089efadc7