ChatGPT in Vulnerability Remediation: A Comparative Analysis
In the bustling streets of the software development world, every corner seems to echo with the whispers of innovation. New tools, new technologies, and new methodologies are constantly emerging, each promising to be the next big thing. Amidst this cacophony, one name has been resonating louder than most: ChatGPT. As developers from all walks of life embraced ChatGPT, a burning question occupied our thoughts: How does this tool measure up in the critical arena of vulnerability remediation?
Setting the Stage
To quench our curiosity, we embarked on an enlightening journey. Our mission? To critically assess how ChatGPT responds to findings from Static Application Security Testing (SAST) tools and compare its solutions against the expertise of an Application Security (AppSec) professional.
The Ingredients of Our Experiment
Our dataset consisted of 105 SAST findings. These findings, reported by two distinct SAST tools, were based on two known vulnerable OWASP applications - JuiceShop and WebGoat. Both these applications are the darlings of the training and benchmarking world.
To automate our tests, we used OpenAI's API with GPT 3.5. While we were slightly disappointed at not having access to the GPT4 API during our research, a subsequent dalliance with version 4 didn't reveal any groundbreaking differences. To ensure a smooth journey, we pre-processed our data before presenting it to OpenAI. This not only made our task more manageable but also simulated a more fluid experience than a developer's direct interaction with ChatGPT. Making the task easier on the tool.
Our expedition yielded some fascinating insights:
- The Good: Showing at least some competency, thirty-one suggestions from ChatGPT indeed resolved the reported vulnerabilities. However, a closer look revealed that many of these did not follow secure coding best practices. Following best practices ensures consistent behavior and readability of the code, making it easy for any developer to understand the code and maintain the project.
- The Not-so-good: Twenty suggestions missed the mark completely. While these suggested code changes were in the area of the reported vulnerability and did not break the application code, the changes either didn't actually fix the reported issue or did fix it but introduced a new vulnerability.
- The Bewildering: A staggering fifty-four suggestions left us scratching our heads. These ranged from code suggestions that seemed to have taken a detour to unrelated parts of the application, to those that were syntactically wrong. Some were mere skeletons, waiting for the developer to breathe life into them, while others made perplexing references to phantom methods or packages. The latter, in particular, could potentially open Pandora's box of security risks.
sanitizeFileName'function to prevent path injection. It's an incomplete outline that requires developers to still do a lot of the work.
As the curtains fall on our exploration, one thing is clear: ChatGPT, with all its prowess, is not the magic wand some might hope it to be. It's a tool with immense potential, but like any tool, it's only as good as the hands that wield it. Developers venturing into the world of ChatGPT and auto-code remediation must tread with caution and critical evaluation. Otherwise, they might find themselves in a worse place than they started.
Are you interested to see it for yourself? Schedule your demo here. See how the magic happens.