Updating Method-Level Comments using Generative AI

This study explores the feasibility of using GitDiff to automatically update method-level comments with GenAI. For this, we leverage the information contained in GitDiff, i.e., a patch representing the changes between two states of the file, to help infer the modifications made to the code for updating the method-level comments. For our study as depicted by below figure, we evaluate the following two GenAI architectures:

CodeT5 (small): A code-specific text-to-text transformer from the T5 family.
Gemma 2B IT: A decoder-only LLM released by Google.

To study the effectiveness of the Git-Diff parameter when it comes to improving the quality of updated comments, this study conducts four different experiments shown in the figure below:

We obtained the following average METEOR score for CodeT5-small and Gemma 2B IT:

Experiment	CodeT5-small	Gemma 2B IT
New Code	0.1127	-
Old Comment + New Code (Baseline)	0.2384	0.05095
Old Code + Old Comment + New Code	0.6942	-
Old Code + Old Comment + Git-Diff	0.7608	0.05794

Note: Due to computational constraints, we could not evaluate the performance of Gemma on experiments with "New Code" only and "Old Code + Old Comment + New Code". Additionally, we have not used any instruction while fine-tuning and evaluating Gemma for this task.

Dataset

The dataset used for our study can be found here. The dataset is already split into train-valid-test. Execute the respective scripts under the Dataset directory to apply the necessary preprocessing and compute GitDiff from <OldCode> and <NewCode>.

Executing CodeT5 or Gemma 2B IT

CodeT5:
1. The fine-tuning script can be found under the "Training Script T5" directory
2. To execute the script on Cluster, run the T5.sh file.
3. To execute the script locally, use the below command:
```
python TrainingScript.py [OPTIONS]
```

Gemma 2B IT:

The fine-tuning script can be found under the "Training-Script-Gemma" directory
To execute the script on the Canada Compute's Narval Cluster, run the launch_training_accelarate.sh file.

To execute the script locally, use the below command:

python finetuning_gemma_for_cc.py [OPTIONS]

OPTIONS:
 data_dir: path to the data directory
 experiment: experiment number. Default = 4
 max_epochs: maximum number of epochs to train Gemma. Default = 1
 batch_size: batch size to use for training, validation, and testing. Default = 4
 max_new_tokens: maximum number of tokens to be generated by Gemma. Default = 128

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Dataset		Dataset
SOTA LLM Comparison		SOTA LLM Comparison
Training Script T5		Training Script T5
Training-Script-Gemma		Training-Script-Gemma
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updating Method-Level Comments using Generative AI

Dataset

Executing CodeT5 or Gemma 2B IT

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Updating Method-Level Comments using Generative AI

Dataset

Executing CodeT5 or Gemma 2B IT

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages