Skip to main content
Version: 0.0.0

Pricing

Base Model Pricing

Your use of base models on Anyscale Endpoints is billed on a $/million-tokens basis. For more information on pricing and billing, including “what is a token”, see the FAQs

ModelPrice ($/M tokens)
Mistral-7B-Instruct-v0.10.15
Llama-2-7b-chat-hf0.15
Llama-3-8b-chat-hf.0.15
gemma-7b-it0.15
NeuralHermes-2.5-Mistral-7B0.15
Llama-2-13b-chat-hf0.25
Mixtral-8x7B-Instruct-v0.10.50
Mixtral-8x22B-Instruct-v0.10.90
Llama-2-70b-chat-hf1.0
Llama-3-70b-chat-hf1.0
CodeLlama-70b-Instruct-hf1.0
thenlper-gte-large0.05
BAAI/bge-large-en-v1.50.05
tip

Every new user will receive $10 free credit to get started.

Fine Tuning Pricing

Fine Tuning is billed at a fixed cost of $5 per run and $/million-tokens. For example, a fine tuning job of Llama-2-13b-chat-hf with 10M tokens would cost $5 + $2x10 = $25

ModelFixed Cost/RunPrice ($/M tokens)
Llama-2-13b-chat-hf52
Llama-2-70b-chat-hf54
mistralai/Mistral-7B-Instruct-v0.151
mistralai/Mixtral-8x7B-Instruct-v0.154

Querying the fine-tuned models is billed on a $/million-tokens basis.

ModelPrice ($/M tokens)
Llama-2-7b-chat-hf*0.25
Llama-2-13b-chat-hf0.50
Llama-2-70b-chat-hf2.00
mistralai/Mistral-7B-Instruct-v0.10.25
mistralai/Mixtral-8x7B-Instruct-v0.11.00
  • We are moving meta-llama/Llama-2-7b-chat-hf to legacy models list. Please use mistralai/Mistral-7B-Instruct-v0.1 for new fine-tuning jobs. You can still query the existing fine-tuned models that are based on meta-llama/Llama-2-7b-chat-hf, but this will be temporary and we recommend migrating to mistralai/Mistral-7B-Instruct-v0.1.