Skip to content

Commit 4485f4e

Browse files
author
Hartorn
committed
feat(giskard-llm): add LLMClient, Azure providers, typed ToolCall, message validation
Introduce LLMClient as the core entry point with named provider aliases, os.environ/ resolution, and lazy instance caching. Replace the global provider cache and if/elif routing with a registry pattern. Add Azure OpenAI and Azure AI Foundry providers. Type tool_calls with Pydantic models, add ChatMessage TypedDict, and per-provider message validation. - Fix Google provider system message bug (was reading from params) - Add gemini/ as alias for google/ prefix (backward compat) - Bare model names default to openai (e.g. "gpt-4o") - 67 functional tests parametrized across all 5 providers - Per-provider CI matrix with conditional env var injection - Comprehensive provider docstrings per 04-provider-docs.mdc Made-with: Cursor
1 parent 668345b commit 4485f4e

30 files changed

Lines changed: 1921 additions & 252 deletions

.github/workflows/integration-tests.yml

Lines changed: 71 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,19 +50,17 @@ jobs:
5050
echo "::error::External contributors require a maintainer to add the 'safe for build' label."
5151
exit 1
5252
53-
test-functional:
53+
test-agents-functional:
5454
needs: authorize
5555
runs-on: ubuntu-latest
5656
timeout-minutes: 30
5757
strategy:
5858
fail-fast: false
5959
matrix:
6060
python-version: ["3.12", "3.13", "3.14"]
61-
package: [giskard-core, giskard-agents, giskard-checks]
6261
provider: [google]
63-
name: test-functional / ${{ matrix.package }} / ${{ matrix.provider }} / ${{ matrix.python-version }}
62+
name: agents / ${{ matrix.provider }} / ${{ matrix.python-version }}
6463
env:
65-
PACKAGE: ${{ matrix.package }}
6664
PROVIDER: ${{ matrix.provider }}
6765
steps:
6866
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
@@ -77,7 +75,72 @@ jobs:
7775
run: uv pip install "giskard-llm[$PROVIDER]"
7876
- name: Run functional tests
7977
env:
80-
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
81-
TEST_MODEL: "gemini/gemini-2.0-flash"
82-
TEST_EMBEDDING_MODEL: "gemini/gemini-embedding-001"
83-
run: make test-functional PACKAGE=$PACKAGE PROVIDER=$PROVIDER
78+
GOOGLE_API_KEY: ${{ matrix.provider == 'google' && secrets.GEMINI_API_KEY || '' }}
79+
TEST_MODEL: "google/gemini-2.0-flash"
80+
TEST_EMBEDDING_MODEL: "google/gemini-embedding-001"
81+
run: make test-functional PACKAGE=giskard-agents PROVIDER=$PROVIDER
82+
83+
test-llm-functional:
84+
needs: authorize
85+
runs-on: ubuntu-latest
86+
timeout-minutes: 30
87+
strategy:
88+
fail-fast: false
89+
matrix:
90+
python-version: ["3.12"]
91+
provider: [openai, google, anthropic, azure, azure_ai]
92+
name: llm / ${{ matrix.provider }} / ${{ matrix.python-version }}
93+
env:
94+
PROVIDER: ${{ matrix.provider }}
95+
steps:
96+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
97+
with:
98+
ref: ${{ github.event.pull_request.head.sha || github.ref }}
99+
- uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7
100+
with:
101+
enable-cache: true
102+
python-version: ${{ matrix.python-version }}
103+
- run: make install
104+
- name: Install provider SDK
105+
run: uv pip install "giskard-llm[$PROVIDER]"
106+
- name: Run functional tests
107+
env:
108+
OPENAI_API_KEY: ${{ matrix.provider == 'openai' && secrets.OPENAI_API_KEY || '' }}
109+
GOOGLE_API_KEY: ${{ matrix.provider == 'google' && secrets.GEMINI_API_KEY || '' }}
110+
ANTHROPIC_API_KEY: ${{ matrix.provider == 'anthropic' && secrets.ANTHROPIC_API_KEY || '' }}
111+
AZURE_API_KEY: ${{ matrix.provider == 'azure' && secrets.AZURE_API_KEY || '' }}
112+
AZURE_API_BASE: ${{ matrix.provider == 'azure' && secrets.AZURE_API_BASE || '' }}
113+
AZURE_API_VERSION: ${{ matrix.provider == 'azure' && secrets.AZURE_API_VERSION || '' }}
114+
AZURE_AI_API_KEY: ${{ matrix.provider == 'azure_ai' && secrets.AZURE_AI_API_KEY || '' }}
115+
AZURE_AI_ENDPOINT: ${{ matrix.provider == 'azure_ai' && secrets.AZURE_AI_ENDPOINT || '' }}
116+
run: make test-functional PACKAGE=giskard-llm PROVIDER=$PROVIDER
117+
118+
test-checks-functional:
119+
needs: authorize
120+
runs-on: ubuntu-latest
121+
timeout-minutes: 30
122+
strategy:
123+
fail-fast: false
124+
matrix:
125+
python-version: ["3.12", "3.13", "3.14"]
126+
provider: [google]
127+
name: checks / ${{ matrix.provider }} / ${{ matrix.python-version }}
128+
env:
129+
PROVIDER: ${{ matrix.provider }}
130+
steps:
131+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
132+
with:
133+
ref: ${{ github.event.pull_request.head.sha || github.ref }}
134+
- uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7
135+
with:
136+
enable-cache: true
137+
python-version: ${{ matrix.python-version }}
138+
- run: make install
139+
- name: Install provider SDK
140+
run: uv pip install "giskard-llm[$PROVIDER]"
141+
- name: Run functional tests
142+
env:
143+
GOOGLE_API_KEY: ${{ matrix.provider == 'google' && secrets.GEMINI_API_KEY || '' }}
144+
TEST_MODEL: "google/gemini-2.0-flash"
145+
TEST_EMBEDDING_MODEL: "google/gemini-embedding-001"
146+
run: make test-functional PACKAGE=giskard-checks PROVIDER=$PROVIDER

libs/giskard-agents/src/giskard/agents/embeddings/litellm_embedding_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
class LitellmEmbeddingModel(BaseEmbeddingModel):
1010
"""An embedding model backed by giskard-llm."""
1111

12-
model: str = Field(default="gemini/gemini-embedding-001")
12+
model: str = Field(default="google/gemini-embedding-001")
1313

1414
async def _embed(
1515
self, texts: list[str], params: EmbeddingParams | None = None

libs/giskard-agents/src/giskard/agents/generators/litellm_generator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ class LiteLLMGenerator(BaseGenerator):
2424
"""A generator for creating chat completion pipelines."""
2525

2626
model: str = Field(
27-
description="The model identifier to use (e.g. 'gemini/gemini-2.0-flash')"
27+
description="The model identifier to use (e.g. 'google/gemini-2.0-flash')"
2828
)
2929
retry_policy: RetryPolicy | None = Field(default_factory=RetryPolicy)
3030

libs/giskard-agents/tests/conftest.py

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
from giskard.agents.embeddings import EmbeddingModel
66
from giskard.agents.embeddings.base import EmbeddingParams
77
from giskard.agents.generators import Generator
8-
from giskard.llm.routing import _provider_cache
98

109
_PROVIDER_PACKAGES = {
1110
"openai": "openai",
@@ -38,24 +37,16 @@ def pytest_collection_modifyitems(items: list[pytest.Item]) -> None:
3837
)
3938

4039

41-
@pytest.fixture(autouse=True)
42-
def _clear_provider_cache():
43-
"""Prevent stale async clients across event-loop boundaries."""
44-
_provider_cache.clear()
45-
yield
46-
_provider_cache.clear()
47-
48-
4940
@pytest.fixture
5041
async def generator():
5142
"""Fixture providing a configured generator for tests."""
52-
return Generator(model=os.getenv("TEST_MODEL", "gemini/gemini-2.0-flash"))
43+
return Generator(model=os.getenv("TEST_MODEL", "google/gemini-2.0-flash"))
5344

5445

5546
@pytest.fixture
5647
def embedding_model():
5748
"""Fixture providing a configured embedding model for tests."""
5849
return EmbeddingModel(
59-
model=os.getenv("TEST_EMBEDDING_MODEL", "gemini/gemini-embedding-001"),
50+
model=os.getenv("TEST_EMBEDDING_MODEL", "google/gemini-embedding-001"),
6051
params=EmbeddingParams(dimensions=1536),
6152
)
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
description: Architecture and design principles for giskard-llm. Always read before modifying providers, routing, or error handling.
3+
alwaysApply: true
4+
---
5+
6+
# giskard-llm Architecture
7+
8+
## Overview
9+
10+
`giskard-llm` is a lightweight routing layer that dispatches `"provider/model"` strings to native LLM SDKs. It replaces litellm with direct SDK calls while presenting a unified response shape to consumers.
11+
12+
## Architecture
13+
14+
The public API (`acompletion`, `aembedding`) routes `"provider/model"` strings through a lazy-loading registry to provider implementations. Providers are self-contained modules under `providers/`, each subclassing a shared ABC and responsible for: role mapping, message validation, tool conversion, error mapping, and response normalization.
15+
16+
Shared contracts live at the package root: Pydantic v2 response types (OpenAI-shaped), a unified error hierarchy with `status_code` for consumer retry logic, and a `should_retry` helper.
17+
18+
## Core Principles
19+
20+
1. **Lazy imports.** Importing `giskard.llm` must never require any provider SDK. SDKs are imported inside provider modules only.
21+
2. **Strict message validation.** Providers validate messages before calling the SDK and raise `BadRequestError` with a clear message. Invalid input must not silently pass through to opaque SDK errors. Opt-in relaxation (e.g., `merge_system=True`) is explicit.
22+
3. **Unified error boundary.** Raw SDK exceptions never escape a provider. Every provider maps its SDK errors to the `errors.py` hierarchy.
23+
4. **Response normalization.** All providers convert native responses to the shared `types.py` models. Consumers never see provider-specific shapes.
24+
5. **Provider config via env vars or `**params`.** The public API has no provider-specific kwargs. Configuration flows through environment variables or pass-through params.
25+
6. **Provider behavior is self-documented.** Each provider class must have a comprehensive docstring covering: env vars, role mapping, error mapping, supported features, and provider-specific kwargs. This is the source of truth for provider behavior.
26+
27+
## Adding a New Provider
28+
29+
1. Create `providers/<name>.py` subclassing `BaseProvider`.
30+
2. Register it in the provider registry in `routing.py`.
31+
3. Add the SDK as an optional dependency extra in `pyproject.toml`.
32+
4. Implement role mapping, message validation, tool conversion, error mapping, and response conversion.
33+
5. Write the class docstring following the provider documentation template (see `04-provider-docs.mdc`).
34+
6. Add unit tests (mocked SDK) and functional test scenarios.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
description: Testing conventions for giskard-llm. Read before writing or modifying tests.
3+
globs: "**/test*.py,**/conftest.py"
4+
---
5+
6+
# giskard-llm Testing Conventions
7+
8+
## Test Structure
9+
10+
- **Unit tests** (`tests/`): Mocked SDK calls, no API keys needed. Cover routing, message conversion, error mapping, response conversion, validation.
11+
- **Functional tests** (`tests/functional/`): Real API calls. Cover end-to-end scenarios across all providers.
12+
13+
## Provider Marks and Auto-Skip
14+
15+
Every functional test is marked with its provider (`@pytest.mark.google`, `@pytest.mark.openai`, etc.). Auto-skip logic in `conftest.py` skips tests whose provider SDK is not installed, so unit test runs never fail due to missing optional dependencies.
16+
17+
## Scenario Design
18+
19+
- **Assert on structure, not content.** Tests must pass with even the weakest model. Assert non-empty responses, correct roles, correct types, parseable JSON — never assert on specific wording.
20+
- **Meaningful inputs and outputs.** Test scenarios should exercise real behavior: system prompts that produce verifiable effects, tool calls with checkable arguments, structured output that validates against a schema. The goal is a test that fails for the right reasons.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
description: Development workflow for giskard-llm. Read before creating a PR.
3+
alwaysApply: false
4+
---
5+
6+
# giskard-llm Development Workflow
7+
8+
## Dependencies
9+
10+
- **Core**: `pydantic>=2.0` only. No other runtime dependencies.
11+
- **Provider SDKs**: Optional extras in `pyproject.toml` (`openai`, `google`, `anthropic`, `azure`). Never add a provider SDK to core.
12+
- **Package manager**: `uv`.
13+
14+
## Format and Lint
15+
16+
`ruff format` and `ruff check` via pre-commit hooks. If `basedpyright` fails on provider files due to uninstalled SDKs, add `# pyright: reportMissingImports=false` at the top of the provider file.
17+
18+
## CI
19+
20+
- **Unit tests**: run on every PR, no SDK required — all provider interactions are mocked.
21+
- **Functional tests**: run per-provider in a matrix. Each matrix entry installs only its SDK and injects only its env vars at the step level.
22+
- **`workflow_dispatch`**: allows manual triggering with org-membership check.
23+
24+
## Commit Messages
25+
26+
Semantic format scoped to the lib: `feat(giskard-llm): add azure provider`, `fix(giskard-llm): google system message extraction`.
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
description: Provider documentation standards. Read when creating or modifying a provider.
3+
globs: "**/providers/*.py"
4+
---
5+
6+
# Provider Documentation
7+
8+
Each provider class must have a comprehensive docstring that serves as the single source of truth for that provider's behavior. This replaces external documentation that would go stale.
9+
10+
## Required Docstring Sections
11+
12+
Every provider class docstring must cover:
13+
14+
1. **Overview** — SDK used, what model prefix routes here (e.g., `"google/"`, `"azure/"`).
15+
2. **Authentication** — Required env vars (e.g., `GOOGLE_API_KEY`) and alternative kwargs.
16+
3. **Role mapping** — How canonical roles (`system`, `user`, `assistant`, `tool`) map to the SDK format.
17+
4. **Message constraints** — Provider-specific validation rules (alternation, system message handling, etc.).
18+
5. **Tool call format** — How tool definitions and tool results are converted.
19+
6. **Error mapping** — Which SDK exceptions map to which `LLMError` subclasses.
20+
7. **Supported features** — Completion, embeddings, structured output, and any limitations.
21+
8. **Provider-specific kwargs** — Any pass-through params unique to this provider (e.g., `merge_system` for Anthropic).
22+
23+
## Example
24+
25+
```python
26+
class GoogleProvider(BaseProvider):
27+
"""Google Gemini provider using the ``google-genai`` SDK.
28+
29+
Routing prefix: ``google/``
30+
31+
Authentication:
32+
- Env: ``GOOGLE_API_KEY`` (or ``GEMINI_API_KEY``)
33+
- Kwargs: ``api_key``
34+
35+
Role mapping:
36+
- ``system`` -> extracted to ``system_instruction`` config (accepts a list)
37+
- ``assistant`` -> ``model``
38+
- ``tool`` -> ``function_response`` part
39+
- ``user`` -> ``user``
40+
41+
Message constraints:
42+
- Multiple system messages: supported natively (passed as list)
43+
- System-only messages: raises ``BadRequestError``
44+
- No strict alternation required
45+
46+
Tool call format:
47+
- Tool definitions: converted to ``FunctionDeclaration``
48+
- Tool results: converted to ``function_response`` parts
49+
- Tool call IDs: synthetic (``call_<index>``) since Gemini doesn't provide them
50+
51+
Error mapping:
52+
- ``google.genai.errors.ClientError`` (400) -> ``BadRequestError``
53+
- ``google.genai.errors.ClientError`` (401/403) -> ``AuthenticationError``
54+
- ``google.genai.errors.ClientError`` (429) -> ``RateLimitError``
55+
- ``google.genai.errors.ServerError`` -> ``ServerError``
56+
57+
Supported features:
58+
- Completion: yes
59+
- Embeddings: yes
60+
- Structured output (response_format): yes, via ``response_schema``
61+
62+
Provider-specific kwargs:
63+
- ``safety_settings``: override default safety settings
64+
"""
65+
```
66+
67+
## Rules
68+
69+
- **The docstring is the spec.** When behavior changes, update the docstring in the same commit.
70+
- **No external provider docs.** Don't maintain separate markdown files per provider. The class docstring is authoritative.
71+
- **README references providers briefly.** The top-level README has a summary table; details link to the source.

libs/giskard-llm/README.md

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,66 @@
11
# giskard-llm
22

3-
Lightweight LLM routing layer over native provider SDKs. Routes `provider/model` strings to OpenAI, Google Gemini, or Anthropic using their native async SDKs.
3+
Lightweight LLM routing layer over native provider SDKs. Routes `provider/model` strings to the correct async SDK (OpenAI, Google Gemini, Anthropic, Azure OpenAI, Azure AI Foundry).
44

55
## Installation
66

77
```bash
8-
pip install giskard-llm[openai] # OpenAI only
9-
pip install giskard-llm[google] # Google Gemini only
10-
pip install giskard-llm[anthropic] # Anthropic only
8+
pip install giskard-llm[openai] # OpenAI + Azure OpenAI + Azure AI Foundry
9+
pip install giskard-llm[google] # Google Gemini
10+
pip install giskard-llm[anthropic] # Anthropic
1111
pip install giskard-llm[all] # All providers
1212
```
1313

14-
## Usage
14+
> **Note:** Azure OpenAI (`azure/`) and Azure AI Foundry (`azure_ai/`) use the `openai` SDK.
15+
> Installing `giskard-llm[openai]` (or `giskard-llm[azure]`) covers all three.
16+
17+
## Quick start
1518

1619
```python
1720
from giskard.llm import acompletion, aembedding
1821

22+
# Module-level functions use env vars automatically
1923
response = await acompletion(
2024
model="openai/gpt-4o",
2125
messages=[{"role": "user", "content": "Hello!"}],
2226
)
2327
print(response.choices[0].message.content)
2428

25-
embeddings = await aembedding(
26-
model="openai/text-embedding-3-small",
27-
input=["hello world"],
29+
# Bare model names default to OpenAI
30+
response = await acompletion(model="gpt-4o", messages=[...])
31+
```
32+
33+
## LLMClient (programmatic configuration)
34+
35+
```python
36+
from giskard.llm import LLMClient
37+
38+
client = LLMClient()
39+
40+
# Configure with explicit values or env var references
41+
client.configure("openai", api_key="sk-...") # pragma: allowlist secret
42+
client.configure("azure-prod", provider="azure",
43+
api_key="os.environ/AZURE_PROD_KEY", # pragma: allowlist secret
44+
base_url="os.environ/AZURE_PROD_ENDPOINT",
45+
api_version="2024-02-01",
46+
)
47+
client.configure("anthropic-relaxed", provider="anthropic",
48+
api_key="os.environ/ANTHROPIC_API_KEY", # pragma: allowlist secret
49+
merge_system=True,
2850
)
51+
52+
response = await client.acompletion("azure-prod/gpt-4o", messages)
53+
response = await client.acompletion("anthropic-relaxed/claude-3-5-haiku-latest", messages)
2954
```
55+
56+
## Provider reference
57+
58+
| Prefix | SDK | Auth env var | Completion | Embeddings | Notable kwargs |
59+
|---|---|---|---|---|---|
60+
| `openai/` (default) | `openai` | `OPENAI_API_KEY` | yes | yes | `base_url`, `timeout` |
61+
| `google/` | `google-genai` | `GOOGLE_API_KEY` / `GEMINI_API_KEY` | yes | yes ||
62+
| `anthropic/` | `anthropic` | `ANTHROPIC_API_KEY` | yes | no | `merge_system`, `timeout` |
63+
| `azure/` | `openai` | `AZURE_API_KEY`, `AZURE_API_BASE` | yes | yes | `api_version`, `base_url` |
64+
| `azure_ai/` | `openai` | `AZURE_AI_API_KEY`, `AZURE_AI_ENDPOINT` | yes | model-dependent | `base_url` |
65+
66+
For detailed per-provider documentation (role mapping, message constraints, tool format, error mapping), see the provider class docstrings in `src/giskard/llm/providers/`.

0 commit comments

Comments
 (0)