AI Generated Code and its Impact on Code Stability

The Curious Codex

             48 Votes

2024-06-21 Published
2024-07-01 Updated
1229 Words, 7  Minute Read

The Author
GEN UK Blog

By Richard (Senior Partner)

Richard has been with the firm since 1992 and was one of the founding partners

 

AI Generated Code

I think many people who write software for a living have looked at AI assisted code generation, from companies like Github(Co-pilot), Tabnine, Sourcegraph and more, but most good developers will understand that the code they generate is simply regurgitated code from the internet with highly variable quality. When a human writes code, we use our intuition and training to write stable, secure code that catches all conditions and will withstand idiocy, but the large language model has none of this, simply throwing out code that might work at best, and if it does work then its flawed.


I'm not saying that its useless, its not, and even I use it from time to time, for looking things up, e.g. "Convert #224411 to RGB", or "what's the reverse sort in PHP", and for this it really does have a value, because I can't convert Hex to RGB in my head and I did forget that arsort() was the reverse sort in php, it happens to us all as we get older, so I'm told.


Safety

If I ask our favourite LLM to generate some code to take a POST variable, append it to a JSON object and then POST that to another server, it will generate something like:


    if (isset($_POST['A'])) {
        $value = $_POST['A'];
        $url = 'https://example.com/endpoint';
        $jsonData = json_encode(['A' => $value]);
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_POST, true);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $jsonData);
        curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $response = curl_exec($ch);
        curl_close($ch);
        echo $response;
    } else {
        echo 'POST variable "A" not set';
    }

And that IS functional, but its also terrible! There is no validation or filtering on the POST variable, it should at the very least be filtered for XSS, SQL injection and other vulnerabilities, and it should also be scoped correctly based on its expected value. There is no TIMEOUT, RETURNTRANSFER, MAXREDIRS, HTTPVERSION and all the important stuff that we add routinely.

Let's try another one, this time using Anthropic's New Claude Sonet, and we'll ask for something a little more complex "How can I quickly count the number of words in a string of a html page, counting only words with P tags, and estimate the reading time". The answer isn't complex, We'll strip all the html tags except for p, and php provides a strip_tags function for this. Next we'll break it into blocks using preg_split, and then loop through each block counting the words with str_word_count. Now let's see what the LLM came up with:

function countWords($string) {
    // Remove HTML tags from the string, except for p
    $cleanString = strip_tags($string, '<p>');
    // Split the string into an array of paragraphs
    $paragraphs = preg_split('/<\/?p>/', $cleanString, -1, PREG_SPLIT_NO_EMPTY);
    $wordCount = 0;
    // Loop through each paragraph and count the words
    foreach ($paragraphs as $paragraph) {
        $words = str_word_count($paragraph, 0);
        $wordCount += count($words);
    }
    return $wordCount;
}

Right, so anyone who's spent any time with PHP will immediately spot why this wouldn't work. The str_word_count counts the number of words in a string and there's no need to supply the second argument because the default is zero. The code then attempts to use count, which counts the number of elements in an array on the integer number returned from str_word_count. Remember, it doesn't understand the code, so its just mashing code chunks together from the internet in the same way it would write you an essay or an email, and given that its a fair mistake to make. You can easily fix it by simply changing it to $wordCount+=str_word_count($paragraph); but if you're a seasoned programmer, what have we saved here?

Stability & Security

There is absolutely zero error handling here, not by checking the json_encode succeeds, not checking the curl succeeds, nothing, and whilst an experienced developer would never ask such a thing from an LLM, there are many developers who don't have the experience who are then using code like this in their projects, and that's a real problem.

Some independent studies from Stanford & New York Universities found that AI generated code consistently made insecure and fragile suggestions, which we already knew.

Pressures

Using things like AI code completion, for a busy team can greatly increase productivity, and the only real downside is the introduction of tragic vulnerabilities and instability that will take much longer to debug, but this doesn't stop some teams using it, and in-fact some teams are being forced to use it to meet revised targets set by management based on the 'theoretical' productivity increase. Ultimately it will end badly.

Outright Bans

There are a number of outright bans on AI generated code, and I'm not going to list them all here, but here are a few:

NetBSD and Gentoo banned AI generated code from their repo's in May this year, and of course Apple and Google have already banned staff from using AI generated code internally last year.

In December 2022 StackOverflow banned all AI generated submissions to their site. However considering a large percentage of AI generated code comes from sites like stackoverflow such a decision has a wider impact on code quality for future models.

Dated Data

If you play with AI code completion a little you'll soon find that its making suggestions based on out-of-date information. It recommends libraries that are no longer maintained, or CSS that's already been updated, or it makes mistakes with versional syntax changes in languages, classically generating warnings in PHP and Node. Models that use 'positive reinforcement' i.e. Pressing the thumbs up for a seemingly correct answer run the risk of generating incorrect and insecure code with ever increasing frequency.

Privacy

Many people don't realise that by using these tools, the codebase, sometimes in its entirity is being uploaded to a third party server without any consideration of the possible security and privacy issues. Companies like OpenAI and Anthropic are not bound by any legal obligation to protect your precious code, and of course no one can trust companies like Google or Microsoft to do the right thing. What if; they are using your codebase to train future models ? nothing says they can't, and in fact OpenAI/Github Co-pilot specifically say that they'll use your activity to train their model, and why not? Your code is probably better than anything they'll find on stackoverflow, in my opinion.

Summary

No matter which way you throw it, AI generated code brings with it a catalogue of issues if its not regulated and mitigated. Used as a tool to do conversions, to lookup documentation, and to provide reference information, it can be a real productivity boost, but letting it write code in 2024 is a really bad idea.

GEN, has not yet banned AI code generation internally, but firstly, its all Local AI so there's no code leaks or privacy issues, and secondly there's strong oversight on its use. I think this is a fair balance given the risks and rewards, but I do completely understand the companies who've banned it outright.


             48 Votes

Comments (3)

R Ranson · 2024-07-03 17:49 UTC
AI is like a new toy, when you first use it -its amazing, but after a little while you start to realise that it takes longer to fix its crap than it would to write it yourself. Just so many little errors.

Andy Barker · 2024-07-01 15:56 UTC
They are all fundamentally flawed, because they write code like they compose text, word at a time which means the code you get out is just a jumble of stackoverflow and reddit posts mashed into one. There is no intelligence behind it. Use it for translations, conversions, asking code questions, but never for actually writing code.

Ruez M · 2024-07-01 15:53 UTC
I use tabnine, and I think AI generated code is generally poor. Its great if you want to say take this list of options and make them into a js array, or spellgrammar check this document, but actually writing code is very hit and miss, mostly miss in my experience.

--- This content is not legal or financial advice & Solely the opinions of the author ---


Version 1.011  Copyright © 2024 GEN, its companies and the partnership. All Rights Reserved, E&OE.   ^sales^  0115 933 9000  Privacy Notice   196 Current Users