Skip to content

1921 & GSK-1851 & GSK-1901 - save langchain chain without ml flow to reduce compatibility#1490

Merged
kevinmessiaen merged 12 commits intomainfrom
feature/gsk-1921-save-langchain-chain-without-ml_flow-to-reduce-compatibility
Oct 25, 2023
Merged

1921 & GSK-1851 & GSK-1901 - save langchain chain without ml flow to reduce compatibility#1490
kevinmessiaen merged 12 commits intomainfrom
feature/gsk-1921-save-langchain-chain-without-ml_flow-to-reduce-compatibility

Conversation

@kevinmessiaen
Copy link
Copy Markdown
Member

@kevinmessiaen kevinmessiaen commented Oct 23, 2023

Description

Save and load LangchainModel using langchain provided methods instead of ml_flow

Related Issue

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

@linear
Copy link
Copy Markdown

linear Bot commented Oct 23, 2023

@kevinmessiaen kevinmessiaen changed the title Feature/gsk 1921 save langchain chain without ml flow to reduce compatibility 1921 & GSK-1851 - save langchain chain without ml flow to reduce compatibility Oct 23, 2023
@kevinmessiaen kevinmessiaen changed the title 1921 & GSK-1851 - save langchain chain without ml flow to reduce compatibility 1921 & GSK-1851 & GSK-1901 - save langchain chain without ml flow to reduce compatibility Oct 23, 2023
@linear
Copy link
Copy Markdown

linear Bot commented Oct 23, 2023

GSK-1851 Add support for QA retriever upload in LLM Hub

GSK-1901 solve serialisability issue

from mattbit:

import giskard as gsk
import pandas as pd
from pathlib import Path


def model_fn(df):
    return [True] * len(df)


class MyCustomModel(gsk.Model):
    def save_model(self, path):
        print("SAVING CUSTOM MODEL TO", path)
        Path(path).joinpath("custom_data").touch()

    classmethod
    def load_model(self, path):
        print("LOADING CUSTOM MODEL FROM", path)
        model = lambda x: [True] * len(x)
        return model

    def model_predict(self, df: pd.DataFrame):
        return self.model(df)


print("TESTING MODEL SAVE")
gsk_model = MyCustomModel(model_fn, model_type="regression")
print("TYPE:", type(gsk_model))
Path("test_save").mkdir(exist_ok=True)

gsk_model.save("test_save")

print(MyCustomModel.load("test_save"))

Comment thread giskard/models/automodel.py Outdated
def save_model(self, local_path, mlflow_meta):
mlflow.langchain.save_model(self.model, path=local_path, mlflow_model=mlflow_meta)
def save(self, local_path: Union[str, Path]) -> None:
super().save(local_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea of factoring out save_model from save is that you shouldn't have to overwrite save, save_model is by default always called by BaseModel.save so I think you're calling it twice in this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually save_model isn't called in neither WrapperModel nor BaseModel

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right, my bad. I relied on my memory instead of looking at the code. I thought we had the same logic as model_predict and predict, but indeed we kept save loose I think because of the ordering of the saving.

Comment thread giskard/models/base/wrapper.py Outdated
Comment thread giskard/models/base/wrapper.py Outdated

def model_predict(self, df):
return [self.model.predict(**data) for data in df.to_dict("records")]
generations = [self.model(data) for data in df.to_dict("records")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These backward incompatible API changes from LangChain make me nervous. Could you please make sure that our code works for all the versions of LangChain we support in pyproject.toml?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a bug fix. The former method only works with LLMChain while the new one works with any type extending Chain, this will continue to work with models uploaded with previous version

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread giskard/models/base/serialization.py Outdated
Copy link
Copy Markdown
Contributor

@rabah-khalek rabah-khalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now @kevinmessiaen, thanks!

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@kevinmessiaen kevinmessiaen merged commit 21fba7d into main Oct 25, 2023
@kevinmessiaen kevinmessiaen deleted the feature/gsk-1921-save-langchain-chain-without-ml_flow-to-reduce-compatibility branch October 25, 2023 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants