Skip to content

Fixed issue where QueryBasedSliceFunction wasn't working with bytes#1276

Merged
rabah-khalek merged 2 commits intomainfrom
feature/gsk-1375-cannot-use-strcontains-with-values-of-inferred-dtype-bytes
Aug 2, 2023
Merged

Fixed issue where QueryBasedSliceFunction wasn't working with bytes#1276
rabah-khalek merged 2 commits intomainfrom
feature/gsk-1375-cannot-use-strcontains-with-values-of-inferred-dtype-bytes

Conversation

@kevinmessiaen
Copy link
Copy Markdown
Member

Description

Fixed issue where QueryBasedSliceFunction wasn't working with bytes

Related Issue

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

@linear
Copy link
Copy Markdown

linear Bot commented Aug 1, 2023



def _decode(series: pd.Series) -> pd.Series:
return series.str.decode("utf-8").fillna(series)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you need .fillna(series) for? what does the decoder do for NaN values?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, I have just tested it, it works.

Copy link
Copy Markdown
Member Author

@kevinmessiaen kevinmessiaen Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decode return NaN for any value that are not bytes. So we decode bytes and put the original value for other row that was non bytes.

This case it will also wok for mix of bytes and str:

series = pd.Series(['this is str', b'this is bytes'])

decoded = series.str.decode("utf-8") # return [NaN, 'this is bytes']

final_value = decoded.fillna(series) # return ['this is str', 'this is bytes']

@rabah-khalek rabah-khalek self-requested a review August 2, 2023 09:07
@rabah-khalek rabah-khalek merged commit 3250a30 into main Aug 2, 2023
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Aug 2, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

60.0% 60.0% Coverage
0.0% 0.0% Duplication

@Hartorn Hartorn deleted the feature/gsk-1375-cannot-use-strcontains-with-values-of-inferred-dtype-bytes branch September 22, 2023 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants