Skip to content

[GSK-1078] Correlation detector#1178

Merged
andreybavt merged 14 commits intomainfrom
task/GSK-1078
Jun 21, 2023
Merged

[GSK-1078] Correlation detector#1178
andreybavt merged 14 commits intomainfrom
task/GSK-1078

Conversation

@mattbit
Copy link
Copy Markdown
Member

@mattbit mattbit commented Jun 15, 2023

No description provided.

@linear
Copy link
Copy Markdown

linear Bot commented Jun 15, 2023

GSK-1078 Spurious correlation

Feature slices highly correlated with label

  • chi2 or student-t test

→ export independence test

Find slices which are highly correlated to given labels (for classification models).

@mattbit mattbit marked this pull request as ready for review June 16, 2023 09:19
@mattbit mattbit requested a review from andreybavt June 16, 2023 09:19
@mattbit
Copy link
Copy Markdown
Member Author

mattbit commented Jun 20, 2023

@andreybavt


# Prepare dataset for slicing
df = dataset.df.copy()
df[dataset.target] = pd.Categorical(df[dataset.target])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we validate that a ds contains a target or it's done by the caller code already?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently the whole scan assumes that there is a target column, although I’m not sure there is a check. I will add that.

Copy link
Copy Markdown
Contributor

@andreybavt andreybavt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, if the target question isn't relevant feel free to merge

@mattbit mattbit requested a review from andreybavt June 21, 2023 13:52
@mattbit
Copy link
Copy Markdown
Member Author

mattbit commented Jun 21, 2023

Generally looks good, if the target question isn't relevant feel free to merge

@andreybavt fixed a few things including the dataset target support in scan.

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

99.4% 99.4% Coverage
0.0% 0.0% Duplication

@andreybavt andreybavt merged commit 980ce50 into main Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants