Pricing

You only pay for what you use on Replicate, billed by the second. When you don’t run anything, it scales to zero and you don’t pay a thing.

Hardware	Price	GPU	CPU	GPU RAM	RAM
CPU cpu	$0.000100/sec $0.36/hr	-	4x	-	8GB
Nvidia A100 (80GB) GPU gpu-a100-large	$0.001400/sec $5.04/hr	1x	10x	80GB	144GB
2x Nvidia A100 (80GB) GPU gpu-a100-large-2x	$0.002800/sec $10.08/hr	2x	20x	160GB	288GB
4x Nvidia A100 (80GB) GPU gpu-a100-large-4x	$0.005600/sec $20.16/hr	4x	40x	320GB	576GB
8x Nvidia A100 (80GB) GPU gpu-a100-large-8x	$0.011200/sec $40.32/hr	8x	80x	640GB	960GB
Nvidia A40 (Large) GPU gpu-a40-large	$0.000725/sec $2.61/hr	1x	10x	48GB	72GB
2x Nvidia A40 (Large) GPU gpu-a40-large-2x	$0.001450/sec $5.22/hr	2x	20x	96GB	144GB
4x Nvidia A40 (Large) GPU gpu-a40-large-4x	$0.002900/sec $10.44/hr	4x	40x	192GB	288GB
8x Nvidia A40 (Large) GPU gpu-a40-large-8x	$0.005800/sec $20.88/hr	8x	48x	384GB	680GB
Nvidia A40 GPU gpu-a40-small	$0.000575/sec $2.07/hr	1x	4x	48GB	16GB
Nvidia L40S GPU gpu-l40s	$0.000975/sec $3.51/hr	1x	10x	48GB	65GB
2x Nvidia L40S GPU gpu-l40s-2x	$0.001950/sec $7.02/hr	2x	20x	96GB	144GB
4x Nvidia L40S GPU gpu-l40s-4x	$0.003900/sec $14.04/hr	4x	40x	192GB	288GB
8x Nvidia L40S GPU gpu-l40s-8x	$0.007800/sec $28.08/hr	8x	80x	384GB	576GB
Nvidia T4 GPU gpu-t4	$0.000225/sec $0.81/hr	1x	4x	16GB	16GB
Additional hardware
Nvidia H100 GPU gpu-h100	$0.001525/sec $5.49/hr	Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.
2x Nvidia H100 GPU gpu-h100-2x	$0.003050/sec $10.98/hr	Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.
4x Nvidia H100 GPU gpu-h100-4x	$0.006100/sec $21.96/hr	Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.
8x Nvidia H100 GPU gpu-h100-8x	$0.012200/sec $43.92/hr	Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.

If you’re new to Replicate you can try featured models for free, but eventually you’ll need to enter a credit card.

Public models

Thousands of open-source machine learning models have been contributed by our community and more are added every day. When running or training one of these models, you only pay for time it takes to process your request.

Each model runs on different hardware and takes a different amount of time to run. You’ll find estimates for how much they cost under "Run time and cost" on the model’s page. For example, for stability-ai/sdxl:

This model costs approximately $0.012 to run on Replicate, but this varies depending on your inputs.

Predictions run on Nvidia A40 (Large) GPU hardware, which costs $0.000725 per second. Predictions typically complete within 17 seconds.

Image models

Video models

Replicate hosts a selection of video models, these are either priced per video, or per second of video generated by the model.

Language models

Replicate hosts a selection of language models, including Llama 3 and Mistral, which are priced per token.

A language model processes text by breaking it into tokens, or pieces of words. Replicate uses the Llama tokenizer to calculate the number of tokens in text inputs and outputs once it's finished.

Learn more about how language model pricing works.

Private models

You aren’t limited to the public models on Replicate: you can deploy your own custom models using Cog, our open-source tool for packaging machine learning models.

Unlike public models, most private models (with the exception of fast booting models) run on dedicated hardware so you don't have to share a queue with anyone else. This means you pay for all the time instances of the model are online: the time they spend setting up; the time they spend idle, waiting for requests; and the time they spend active, processing your requests. If you get a ton of traffic, we automatically scale up and down to handle the demand.

For fast booting models you'll only be billed for the time the model is active and processing your requests, so you won't pay for idle time like with other private models. Fast booting versions of models are labeled as such in the model's version list.

As with public models, if you would like more control over how a private model is run, you can use a deployments.

Learn more

For a deeper dive, check out how billing works on Replicate.