Replicate

At a glance

Set these per environment. See Connect an integration.

Variable	Required	Description
`REPLICATE_API_TOKEN`	Yes	Replicate API token (starts with r8_) Docs.

Create a Replicate account: Go to https://replicate.com and sign up (GitHub sign-in supported). New accounts get a small amount of free usage before billing is required.
Create an API token: Open https://replicate.com/account/api-tokens, give the token a descriptive name (e.g. ‘Veryfront Integration’), and create it.
Store the token: Copy the token and add it to your .env file as REPLICATE_API_TOKEN=r8_…
Verify access: Run the List Models tool to confirm the token works. A 401 means the token is wrong or revoked.

Predictions are billed per second of compute - costs vary widely by model and hardware
Create Prediction sends Prefer: wait=60 to return synchronously when possible; long-running models still return status ‘starting’ or ‘processing’ - poll with Get Prediction
Use the version ID from Get Model’s latest_version.id field when creating predictions

Tool	Access	Description
List Models	Read	List public models available on Replicate
Get Model	Read	Get details about a model, including its latest version ID
Create Prediction	Write	Run a model by creating a prediction from a version ID and input object
Get Prediction	Read	Get the status and output of a prediction
Cancel Prediction	Write	Cancel a running prediction

Find a Replicate model for a task I describe and show me its latest version ID and inputs.
Run a Replicate model with inputs I provide and report the output when it finishes.