Skip to content
All posts

ChatGPT in Vulnerability Remediation: A Comparative Analysis

Mobb Creative Assets (2)


In the bustling streets of the software development world, every corner seems to echo with the whispers of innovation. New tools, new technologies, and new methodologies are constantly emerging, each promising to be the next big thing. Amidst this cacophony, one name has been resonating louder than most: ChatGPT. As developers from all walks of life embraced ChatGPT, a burning question occupied our thoughts: How does this tool measure up in the critical arena of vulnerability remediation?

Setting the Stage

To quench our curiosity, we embarked on an enlightening journey. Our mission? To critically assess how ChatGPT responds to findings from Static Application Security Testing (SAST) tools and compare its solutions against the expertise of an Application Security (AppSec) professional.

The Ingredients of Our Experiment

Our dataset consisted of 105 SAST findings. These findings, reported by two distinct SAST tools, were based on two known vulnerable OWASP applications - JuiceShop and WebGoat. Both these applications are the darlings of the training and benchmarking world.

To automate our tests, we used OpenAI's API with GPT 3.5. While we were slightly disappointed at not having access to the GPT4 API during our research, a subsequent dalliance with version 4 didn't reveal any groundbreaking differences. To ensure a smooth journey, we pre-processed our data before presenting it to OpenAI. This not only made our task more manageable but also simulated a more fluid experience than a developer's direct interaction with ChatGPT. Making the task easier on the tool.

The Revelations

Our expedition yielded some fascinating insights:

  • The Good: Showing at least some competency, thirty-one suggestions from ChatGPT indeed resolved the reported vulnerabilities. However, a closer look revealed that many of these did not follow secure coding best practices. Following best practices ensures consistent behavior and readability of the code, making it easy for any developer to understand the code and maintain the project.

    Screenshot 2023-11-09 at 21.50.35
    The image displays JavaScript code that sanitizes HTML to prevent XSS attacks, using an escapeHTML function. A better approach, however, would be using jQuery's $("#toast").text(result) for a more direct and secure way to treat content as text and guard against malicious scripts.


  • The Not-so-good: Twenty suggestions missed the mark completely. While these suggested code changes were in the area of the reported vulnerability and did not break the application code, the changes either didn't actually fix the reported issue or did fix it but introduced a new vulnerability.

    The image displays JavaScript code that corrects a command injection flaw by sanitizing the id parameter in a Node.js app. Yet, this fix could unintentionally lead to a NoSQL injection risk, making the parameter secure from shell commands but still vulnerable to database query manipulation.

  • The Bewildering: A staggering fifty-four suggestions left us scratching our heads. These ranged from code suggestions that seemed to have taken a detour to unrelated parts of the application, to those that were syntactically wrong. Some were mere skeletons, waiting for the developer to breathe life into them, while others made perplexing references to phantom methods or packages. The latter, in particular, could potentially open Pandora's box of security risks.

    The image depicts JavaScript code with a basic 'sanitizeFileName' function to prevent path injection. It's an incomplete outline that requires developers to still do a lot of the work.


The Epilogue

As the curtains fall on our exploration, one thing is clear: ChatGPT, with all its prowess, is not the magic wand some might hope it to be. It's a tool with immense potential, but like any tool, it's only as good as the hands that wield it. Developers venturing into the world of ChatGPT and auto-code remediation must tread with caution and critical evaluation. Otherwise, they might find themselves in a worse place than they started.


Are you interested to see it for yourself? Schedule your demo here. See how the magic happens.


Kirill Efimov
Kirill Efimov
Kirill Efimov is a highly skilled software engineer and security expert with a strong background in software development and team leadership. Currently serving as the Founding Engineer (Security) at Mobb, he brings over a decade of experience to the field of cybersecurity.