feat: add giskard-llm as lean litellm replacement#2329
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the LLM integration within the project by replacing the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new giskard-llm library, serving as a lightweight routing layer for various LLM providers (OpenAI, Google Gemini, Anthropic). The giskard-agents library has been refactored to depend on and utilize this new giskard-llm library, replacing its direct dependency on litellm. This change involves updating imports, adapting to the new unified response types for completions and embeddings, and modifying the retry middleware to use giskard-llm's error handling. Additionally, the build system and test suite have been updated to include the new library and support provider-specific functional testing, allowing tests to be skipped if a provider's SDK is not installed. No feedback to provide.
4485f4e to
0aea586
Compare
- Introduced a new `_coerce_json` function to handle JSON string coercion in argument parsing. - Updated `ToolCallFunction` and `ResponseFunctionToolCall` classes to use `ArgumentDict`, improving type safety and consistency in argument handling. - Added necessary imports to support the new type definitions and validation logic.
…ment - Changed the error message in `AnthropicProvider` to specify that messages must contain at least one non-system message, improving clarity for users regarding message requirements.
- Modified the `transcript` property in `AssistantMessage` to handle empty output text gracefully, ensuring it defaults to an empty string instead of including a colon. - Added new tests to verify the correct formatting of the transcript, including cases with tool calls, ensuring no duplicated role prefixes in the output.
…onMessage transcript handling - Revised the documentation for output types to include `AssistantMessage` in the list of Pydantic models. - Modified the `transcript` property in `FunctionMessage` to return an empty string when content is None, improving output consistency.
- Update design.md to reflect ArgumentDict (dict) contract instead of stale str claim - Add exhaustive ValueError fallback in _completion_content_to_block and _completion_content_to_parts to prevent silent None returns - Fix _validate_messages to count developer role alongside system in instruction message limit (system+developer combo now correctly requires merge_system=True) - Add FunctionMessage transcript tests and translator tests (raises for Anthropic/Google, passes through for OpenAI) - Remove no-op deserialize_arguments calls where arguments is already a dict (ArgumentDict) - Add test_anthropic_validate_system_and_developer_raises_without_merge for the previously unchecked mixed case Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Raise ValueError for unsupported content type/role in OpenAI chat translator instead of returning None implicitly - Wrap translator calls in try/except ValueError -> BadRequestError in Anthropic, Google, and OpenAI providers so translation errors flow through the error hierarchy - Treat developer role as instruction-only in Google provider _validate_messages to prevent empty contents list - Use dedicated fc_idx counter for Google tool call IDs instead of content part index - Remove dead ResponseTranslator class from openai_chat.py - Add case _ guard in chat.message() match statement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Updated numpy and giskard-llm version specifications to include upper bounds, ensuring compatibility and preventing potential breaking changes. - Refactored test assertions in test_generator_backends.py to align with updated response structure from the API. - Introduced a new utility function to extract system messages, enhancing code clarity and reusability across translators.
- Changed the return type in LoggingMiddleware from Response to CompletionResponse to align with the updated response structure. - Updated the workflow step runner to correctly handle tool calls from the response message. - Refactored GoogleChatTranslator to generate unique tool call IDs using uuid4 instead of a simple counter. - Enhanced error handling in deserialize_arguments to raise a ValueError for invalid JSON input. - Adjusted imports and removed unused Response references in generator initialization.
…erators - Changed the type of messages in LoggingMiddleware from Message to ChatMessage for better type alignment. - Enhanced LiteLLMGenerator to serialize tool call arguments correctly, ensuring proper handling of function arguments. - Updated GoogleChatTranslator to default function descriptions to an empty string if None, improving robustness.
- Updated exception handling in AnthropicProvider and OpenAIProvider to catch all exceptions, allowing for broader error management. - Ensured that the _map_error method is invoked for all exceptions, improving robustness in response processing.
…orkflow and generators - Added RuntimeError for empty choices list in _StepRunner to enhance error handling during response processing. - Updated type hints in BaseGenerator to reflect CompletionResponse instead of Response, ensuring consistency across the codebase. - Enhanced GoogleProvider to validate message roles more accurately, preventing empty system messages.
…I format - Refactored LiteLLMGenerator to utilize OpenAIChatTranslator for message serialization, ensuring compatibility with OpenAI's expected input format. - Updated type hints in the _serialize_messages method to reflect the new return type, enhancing type safety and clarity. - Removed outdated serialization logic to streamline the codebase.
…rialization - Removed unused imports and outdated serialization methods in OpenAIChatTranslator to simplify the codebase. - Updated tool and message serialization to utilize model_dump for improved consistency with OpenAI's expected format. - Enhanced ArgumentDict to include a PlainSerializer for JSON serialization based on context, ensuring proper handling of arguments.
…ranslator - Eliminated outdated tool call methods and imports in OpenAIChatTranslator to simplify the codebase. - Streamlined the class by focusing on essential functionality, enhancing maintainability and clarity.
…response handling - Renamed `prompt_tokens` and `completion_tokens` to `input_tokens` and `output_tokens` across various translators and response models for consistency. - Simplified the OpenAIResponseTranslator by removing outdated input handling methods and utilizing `model_dump` for serialization. - Updated tests to reflect the new naming conventions, ensuring alignment with the refactored usage structure.
…r implementations - Introduced a new serialization framework for various models, improving consistency across Google and OpenAI translators. - Updated the GoogleResponseTranslator and OpenAIResponseTranslator to utilize the new serialization methods, enhancing maintainability. - Refactored the Anthropic and Google Chat translators to improve message handling and serialization logic. - Added new utility functions for text content serialization and streamlined input handling across multiple response models. - Updated tests to ensure compatibility with the new serialization structure and improved overall test coverage.
- Removed dependency on OpenAIChatTranslator for message serialization in LiteLLMGenerator. - Implemented a new serialization method using model_dump for converting messages to OpenAI's expected format. - Enhanced type safety by updating return type hints in the _serialize_messages method.
…esult - Enhanced the output_text method to filter out None values from response outputs, ensuring only valid text is concatenated. - Improved readability by restructuring the list comprehension for better clarity and maintainability.
…atParams - Updated the response_format field validator to use class method syntax for improved clarity and consistency. - Changed type hints from `dict[str, object]` to `dict[str, Any]` to enhance type safety. - Simplified the validation logic to ensure proper handling of BaseModel subclasses.
…oss translators - Updated message content handling in OpenAI, Anthropic, and Google Chat translators to support sequences of text content. - Introduced new utility methods for converting content blocks to Giskard format, improving consistency in message serialization. - Refactored message types to accommodate both string and sequence types for content, enhancing flexibility in message representation. - Removed unused utility functions and streamlined imports to improve code clarity and maintainability. - Updated tests to validate new content handling and serialization logic, ensuring compatibility with the revised structure.
…across providers - Updated error handling in Google, OpenAI, and Anthropic providers to consistently raise BadRequestError on validation failures. - Refactored message validation logic to enhance clarity and maintainability. - Adjusted GoogleChatTranslator to include message count in tool call IDs for better traceability. - Simplified usage token handling in GoogleResponseTranslator to align with updated response structures. - Enhanced tests to validate new error handling and response formats, ensuring robustness across providers.
…rkflow components - Replaced `output_text` with `text` in various classes to standardize message content retrieval. - Enhanced message extraction logic to handle None values and improve clarity in message representation. - Updated tests to reflect changes in message handling, ensuring robust validation of text content across components.
… update test assertions - Updated `model_dump` calls in `GiskardLLMGenerator` and `LiteLLMGenerator` to include `exclude_unset=True` for improved parameter handling. - Refactored test assertions to check for `text` instead of `content` in message responses, ensuring consistency across tests. - Enhanced type hints in test functions to improve code clarity and maintainability.
- Introduced a new class method `register_serializer` in `_BaseModel` to facilitate the registration of serializers. - Updated import statements to include the new registration function, enhancing the model's serialization capabilities.
…sponse translators - Replaced the `register_serializer` decorator with a class method approach for `ToolDef`, `RefusalContent`, `TextContent`, and other message types across Anthropic, Google, and OpenAI translators. - Introduced a `_PROVIDER` constant in each translator file to standardize provider naming in serialization context. - Updated `model_dump` calls to utilize the new `_PROVIDER` constant for improved consistency in context handling.
refactor(giskard-llm): translator layer, split types, and provider slim-down
Summary
giskard-llmlibrary that replaceslitellmas the default routing layer forgiskard-agents, wrapping native provider SDKs (OpenAI, Google GenAI, Anthropic, Azure OpenAI, Azure AI Foundry) and eliminating 200+ transitive dependencies.giskard-agentsto import fromgiskard.llmviaGiskardLLMGenerator, with pass-through extras so users install only the provider SDKs they need (pip install giskard-agents[openai]).LiteLLMGeneratoravailable as a first-class alternative backend behind an optionallitellmextra, so both backends can coexist in the same process.@pytest.mark.google,@pytest.mark.litellm, etc.) with auto-skip logic and a CI matrix that installs/tests each backend + provider independently.What changed
New:
libs/giskard-llm/providers/base.py) —CompletionProvider,EmbeddingProvider,ResponseProviderprotocols. Providers implement only the protocols they support (e.g. Anthropic implements onlyCompletionProvider).providers/openai.py,providers/google.py,providers/anthropic.py,providers/azure_openai.py,providers/azure_ai.py) — thin wrappers over native SDKs with unified error mapping via_map_error(), structured output via native SDK features, and class-level_PROVIDERattribute for correct error attribution in Azure subclasses.ResponseProviderprotocol withrespond()method, supporting OpenAI Responses API and Gemini Interactions API with unifiedResponseResult,ResponseOutputText,ResponseOutputFunctionCalltypes. Tool definitions are converted from nested Chat Completions format to the flat format expected by each API.FunctionCallOutputTypedDict as the unified input format for feeding back tool results. GoogleProvider normalizes this tofunction_resultformat internally.errors.py) —LLMErrorbase withAuthenticationError,RateLimitError,ServerError,LLMTimeoutError,BadRequestError,UnsupportedOperationError,ProviderNotAvailableError, each carrying astatus_codefor retry decisions.ProviderNotAvailableErroraccepts an optionalextrahint (e.g.extra="azure"→pip install giskard-llm[azure]).routing.py) — parses"provider/model"strings and dispatches to the correct provider adapter.LLMClientsupports named configurations viaconfigure(). Module-levelconfigure(),reset(),acompletion(),aembedding(),aresponse()convenience functions.types.py) — Pydantic v2 models inheriting from_BaseModel(which defaultsmodel_dumptoexclude_none=True).CompletionResponse,EmbeddingResponse,ResponseResult.ToolCallFunction.argumentsisstr(JSON) for wire-compatible round-trips. TypedDicts (ToolDef,ChatMessage,FunctionCallOutput) for user-constructed inputs.retry.py) — class-basedRETRYABLE_ERRORS = frozenset({LLMTimeoutError, RateLimitError, ServerError})and ashould_retry(exc)helper — no HTTP-status branching at the retry layer.docs/design.md) — documents type conventions (TypedDict vs Pydantic), tool format differences, canonical tool result format, and key design decisions.Changed:
libs/giskard-agents/GiskardLLMGenerator(renamed fromLiteLLMGenerator) andlitellm_embedding_model.pynow importacompletion,aembedding,should_retryfromgiskard.llm.pyproject.tomlreplaces thelitellmhard dep withgiskard-llm>=1.0.0a1and re-exports its provider extras (openai,google,anthropic,all). A newlitellmextra re-enables the optionalLiteLLMGenerator.LiteLLMGeneratorrestored as a first-class alternative backend using thelitellmSDK. Importing it without the extra raisesImportErrorwith a clear install hint; it is lazily re-exported fromgenerators/__init__.pyvia__getattr__so plain imports ofgiskard.agentsnever requirelitellm.GiskardLLMGeneratorno longer claims the"litellm"discriminator kind — each class owns its ownkind.CompletionResponse/EmbeddingResponsefromgiskard.llm.conftest.pyadds apytest_collection_modifyitemshook to auto-skip provider-marked tests when the SDK isn't installed, plus an autouse fixture to clear the provider cache between tests.test_litellm_generator.py— unit tests for the LiteLLM adapter (mockedacompletion, retry middleware status-code mapping, discriminator round-trip).test_generator_backends.py— functional tests exercising both backends through directcomplete()andChatWorkflow, plus atest_backends_coexist_in_same_processtest that only runs when both SDKs are installed.Changed: CI
ci.ymladdsgiskard-llmto the unit test matrix (Python 3.12–3.14) and atest-no-providersjob for SDK-less error handling.integration-tests.yml:test-agents-functionaluses a 3-layout × 3-python matrix (giskard-llm/litellm/both) installed via top-levelgiskard-agents[...]extras so the pass-through declarations are exercised.test-llm-functional: matrix over 5 providers (openai, google, anthropic, azure, azure_ai), secrets scoped tocienvironment.test-checks-functional: matrix over google provider, secrets scoped tocienvironment.workflow_dispatchtrigger for manual runs withhead.sha || github.reffallback.persist-credentials: false.labeler.ymladds aScope: LLMlabel forlibs/giskard-llm/**.Makefilegainstest-no-providersandinstall-no-providerstargets.gpt-4.1-nano(OpenAI/Azure),gemini-2.5-flash(Google),claude-haiku-4-5-20251001(Anthropic). Azure OpenAI defaultapi_versionupdated to2024-10-21.Bug fixes (from review follow-up)
_normalize_input_itemsto convert all items to Google's preferred flatContentParamformat (Pattern A).function_callusesid(notcall_id) withargumentsas dict;function_call_output→function_resultwith resolved name; role-tagged turns →TextContentParam. Docstring links to SDK type definitions, official example, and known issue #1906._to_response_result— prioritizeoutput.idoveroutput.call_idfor function call IDs (matching what Google actually returns)._convert_messages— resolvetool_call_idto actual function name from preceding assistanttool_calls; setrole="user"forrole="tool"messages._map_error— handleAPI_KEY_INVALIDstring check forAuthenticationError; fall back to status 500 on SDKAPIError._validate_messages— skip alternation check when either role is"tool"(consecutive tool messages are valid since they merge)._convert_messages— merge consecutiverole="tool"messages into a single user message with multipletool_resultblocks._normalize_input_item— serializefunction_call.argumentsfrom dict to JSON string (OpenAI Responses API rejects dicts)._map_error— tolerateAPIConnectionErrormissing astatus_codeattribute.fromline so they apply to optional SDK imports without leaking to the rest of the module.Misc
pyrightconfig.jsonaddslibs/giskard-llm/srcto execution environments.pyproject.tomladds the new package.uv.lockregenerated after thelitellmremoval and follow-up dependency upgrades (anthropic, cryptography, google-genai, logfire-api, numpy, and other transitive bumps)..secrets.baselinerefreshed to track new test-file line numbers already flagged as false positives.Test plan
giskard-llm,giskard-agents,giskard-checks,giskard-core(103 giskard-llm tests)basedpyrightpasses with 0 errorsTo re-run integration tests manually: Dispatch workflow with ref
feat/giskard-llm.Made with Cursor