Richard has been with the firm since 1992 and was one of the founding partners
AI Generated Code
I think many people who write software for a living have looked at AI assisted code generation,
from companies like Github(Co-pilot), Tabnine, Sourcegraph and more, but most good developers
will understand that the code they generate is simply regurgitated code from the internet with highly variable
quality. When a human writes code, we use our intuition and training to write stable, secure code
that catches all conditions and will withstand idiocy, but the large language model has none of this, simply
throwing out code that might work at best, and if it does work then its flawed.
I'm not saying that its useless, its not, and even I use it from time to time, for looking things
up, e.g. "Convert #224411 to RGB", or "what's the reverse sort in PHP", and for this it
really does have a value, because I can't convert Hex to RGB in my head and I did forget that arsort() was the
reverse sort in php, it happens to us all as we get older, so I'm told.
Safety
If I ask our favourite LLM to generate some code to take a POST variable, append it to a JSON
object and then POST that to another server, it will generate something like:
And that IS functional, but its also terrible! There is no validation or filtering on the POST variable,
it should at the very least be filtered for XSS, SQL injection and other vulnerabilities,
and it should also be scoped correctly based on its expected value. There is no TIMEOUT, RETURNTRANSFER,
MAXREDIRS, HTTPVERSION and all the important stuff that we add routinely.
Let's try another one, this time using Anthropic's New Claude Sonet, and we'll ask for something a little more complex "How can I quickly count the number of words in a string of a html page, counting only words with P tags, and estimate the reading time". The answer
isn't complex, We'll strip all the html tags except for p, and php provides a strip_tags function for this. Next we'll break it into blocks using preg_split, and then loop through each block counting the words with str_word_count. Now let's see what the LLM came up with:
function countWords($string) {
// Remove HTML tags from the string, except for p
$cleanString = strip_tags($string, '<p>');
// Split the string into an array of paragraphs
$paragraphs = preg_split('/<\/?p>/', $cleanString, -1, PREG_SPLIT_NO_EMPTY);
$wordCount = 0;
// Loop through each paragraph and count the words
foreach ($paragraphs as $paragraph) {
$words = str_word_count($paragraph, 0);
$wordCount += count($words);
}
return $wordCount;
}
Right, so anyone who's spent any time with PHP will immediately spot why this wouldn't work. The str_word_count
counts the number of words in a string and there's no need to supply the second argument because the default is
zero. The
code then attempts to use count, which counts the number of elements in an array on the integer number returned from
str_word_count. Remember, it doesn't understand the code, so its just mashing code chunks together from the
internet in the same way it would write you an essay or an email, and given that its a fair mistake to make. You can easily fix it by simply changing it to $wordCount+=str_word_count($paragraph); but if you're a seasoned programmer,
what have we saved here?
Stability & Security
There is absolutely zero error handling here, not by checking the json_encode succeeds, not
checking the curl succeeds, nothing, and whilst an
experienced developer would never ask such a thing from an LLM, there are many developers who don't have the experience who are
then using code like this in their projects, and that's a real problem.
Some independent studies from Stanford & New York Universities found that AI generated code consistently made insecure
and fragile suggestions, which we already knew.
Pressures
Using things like AI code completion, for a busy team can greatly increase productivity, and the
only real downside is the introduction of tragic vulnerabilities and instability that will take much longer to
debug, but this
doesn't stop some teams using it, and in-fact some teams are being forced to use it to meet revised targets set by
management based on the 'theoretical' productivity increase. Ultimately it will end badly.
Outright Bans
There are a number of outright bans on AI generated code, and I'm not going to list them all here,
but here are a few:
NetBSD and Gentoo banned AI generated code from their repo's in May this year, and of course
Apple and Google have already banned staff from using AI generated code internally last year.
In December 2022 StackOverflow banned all AI generated submissions to their site. However considering a
large percentage of AI generated code comes from sites like stackoverflow such a decision has a wider impact on code quality for future models.
Dated Data
If you play with AI code completion a little you'll soon find that its making suggestions based on
out-of-date information. It recommends libraries that are no longer maintained, or CSS that's already
been updated, or it makes mistakes with versional syntax changes in languages, classically generating warnings in
PHP and Node. Models that use 'positive reinforcement' i.e. Pressing the thumbs up for a seemingly correct answer
run the risk of generating incorrect and insecure code with ever increasing frequency.
Privacy
Many people don't realise that by using these tools, the codebase, sometimes in its entirity is
being uploaded to a third party server without any consideration of the possible security and privacy issues.
Companies
like OpenAI and Anthropic are not bound by any legal obligation to protect your precious code, and of course no one
can trust companies like Google or Microsoft to do the right thing. What if; they are using your codebase to train
future models ? nothing says they can't, and in fact OpenAI/Github Co-pilot specifically say that they'll use your
activity to train their model, and why not? Your code is probably better than anything they'll find on
stackoverflow, in my opinion.
Summary
No matter which way you throw it, AI generated code brings with it a catalogue of issues if
its not regulated and mitigated. Used as a tool to do conversions, to lookup documentation, and to
provide reference information, it can be a real productivity boost, but letting it write code in 2024 is a
really bad idea.
GEN, has not yet banned AI code generation internally, but firstly, its all Local AI so there's no code
leaks or privacy issues, and secondly there's strong oversight on its use. I think this is a
fair balance given the risks and rewards, but I do completely understand the companies who've banned it outright.
48 Votes
Comments (3)
R Ranson
· 2024-07-03 17:49 UTC
AI is like a new toy, when you first use it -its amazing, but after a little while you start to realise that it takes longer to fix its crap than it would to write it yourself. Just so many little errors.
Andy Barker
· 2024-07-01 15:56 UTC
They are all fundamentally flawed, because they write code like they compose text, word at a time which means the code you get out is just a jumble of stackoverflow and reddit posts mashed into one. There is no intelligence behind it. Use it for translations, conversions, asking code questions, but never for actually writing code.
Ruez M
· 2024-07-01 15:53 UTC
I use tabnine, and I think AI generated code is generally poor. Its great if you want to say take this list of options and make them into a js array, or spellgrammar check this document, but actually writing code is very hit and miss, mostly miss in my experience.
×
--- This content is not legal or financial advice & Solely the opinions of the author ---