At a glance
- Availability: Experimental (how to enable).
- Auth: API key.
- Connection: The key comes from
REPLICATE_API_TOKEN. - Docs: https://replicate.com/docs/reference/http
Credentials
Set these per environment. See Connect an integration.| Variable | Required | Description |
|---|---|---|
REPLICATE_API_TOKEN | Yes | Replicate API token (starts with r8_) Docs. |
Setup
- Create a Replicate account: Go to https://replicate.com and sign up (GitHub sign-in supported). New accounts get a small amount of free usage before billing is required.
- Create an API token: Open https://replicate.com/account/api-tokens, give the token a descriptive name (e.g. ‘Veryfront Integration’), and create it.
- Store the token: Copy the token and add it to your .env file as REPLICATE_API_TOKEN=r8_…
- Verify access: Run the List Models tool to confirm the token works. A 401 means the token is wrong or revoked.
- Predictions are billed per second of compute - costs vary widely by model and hardware
- Create Prediction sends Prefer: wait=60 to return synchronously when possible; long-running models still return status ‘starting’ or ‘processing’ - poll with Get Prediction
- Use the version ID from Get Model’s latest_version.id field when creating predictions
Tools
| Tool | Access | Description |
|---|---|---|
| List Models | Read | List public models available on Replicate |
| Get Model | Read | Get details about a model, including its latest version ID |
| Create Prediction | Write | Run a model by creating a prediction from a version ID and input object |
| Get Prediction | Read | Get the status and output of a prediction |
| Cancel Prediction | Write | Cancel a running prediction |
Example prompts
- Find a Replicate model for a task I describe and show me its latest version ID and inputs.
- Run a Replicate model with inputs I provide and report the output when it finishes.