Effective Strategies to Prevent Contact Form Spam

The Curious Codex

13 Votes

2024-09-11 Published, 2024-09-22 Updated
2115 Words, 11 Minute Read

Richard (Senior Partner)

Richard has been with the firm since 1992 and was one of the founding partners

Form Spam

If your site has a form or two, maybe a contact form or an enquiry form then you're going to experience form spam. In most cases this will be from 'people' offering SEO or website services who will promise to get you on page one of google for an undesclosed amount, and sometimes to expand a certain part of your anatomy.

I would say, that this sort of form spam is never going to yeild any results for the spammers, but if it didn't then they wouldn't do it, would they - so whoever you are, stop responding to form spam!

This sort of spam isn't just annoying, it's time-consuming and if you're using automation, it can break things. There are however ways to screen it and eliminate it, so in this article we'll discuss the common ones, and some of the less common and more creative ones.

Captcha

The once spam-stop for any contact form was the captcha, but those days are long gone with even the more advanced captcha's being bypassed by AI powered tools. Unfortunately, selecting cars or busses, or a badly morphed word in an image no longer has much effect, no matter what Google say. Captcha's can still stop some posts, but generally today it's not worth the time and effort to implement them.

Physical Actions

You may have seen the odd form popping up with a click and hold button to submit, and this is indeed creative because a bot or form spammer program isn't expecting to have to hold down a button, and of course it will fail - for now. As these become more popular, changing the code to press and hold isn't a major venture, so I wouldn't put any money on this one.

I have seen some javascript powered filters where you select the odd one out, or have to rotate a shape until it matches the orientation, or have to choose one of a selection that should not be in the group, but all these ultimately have a limited number of variations, and like googles reCaptcha, once you know the variations, the game's up.

Email

In most forms, you want to collect an email address, and much form spam puts something invalid in here, so validate your email's, and not just to be structurally correct, but check it's a real email address with a domain that has MX records etc. It only takes a few seconds to pull a list of MX records and ping the mail servers to check they exist.

If you're a business dealing B2B, then you have the option of blocking the spammy email providers, e.g. 'gmail.com', 'yahoo.com', 'hotmail.com', 'outlook.com', 'aol.com', 'icloud.com', 'protonmail.com', 'mail.com', 'zoho.com', 'yandex.com', 'gmx.com', 'tutanota.com', 'fastmail.com', 'hushmail.com', 'inbox.com', 'gmx.us', 'gmx.net', 'runbox.com', 'aim.com', 'ymail.com', 'rocketmail.com', 'protonmail.ch', 'tutanota.de', 'tutamail.com', 'tuta.io', 'keemail.me', 'mail.ru', 'rambler.ru', 'yeah.net', '163.com', '126.com', 'qq.com', 'sina.com', 'sohu.com', 'tom.com', '189.cn', '139.com', 'wo.cn', '21cn.com', '10minutemail.com', 'guerrillamail.com', 'tempmail.org', 'mailinator.com', 'throwawaymail.com', etc. These spammy email domains account for almost all of form spam, so if you're in the position to block them, do so.

IP Screening

The IP Address is the address from which the spammer is originating their form spam. You can use services like MaxMind and ipqualityscore to get a profile of the IP address, and then make decisions on how to handle that. You could for example, block IP's that are in Russia, North Korea, or Leamington Spa, or, you could block IP's with a high fraud score, or that are known to be VPNs, TOR exit nodes, or Proxies.

Another good option is to get the hostname of the IP, and screen that for things like '.amazonaws.com', '.hwclouds-dns.com', '.colocrossing.com', etc, with these being cloud providers from which many of the form spammers run their operations.

Timing

A very good option is to start a timer when the contact form is loaded, and then calculate the time taken from then, to the submission. Generally, no one will complete the form in less than 15 seconds, but of course the form spammer can and will. If you have a more complex form, then increase the value to an appropriate level. Any submission less than this time is just dumped.

Javascript

Some form spam programs bypass the form itself and simply POST the response, which is ok but not if you use some client side javascript to generate a token locally. This can be as simple as a count of seconds since the script started, or far more complex obfuscation of the user-agent or browser flags. Eitherway, no JavaScript, no workie and we'll catch this server side. There are plenty of variances on this client side approach, and for form spammers using POST only, it will stop them dead, but for those using puppeteer or similar to remote control a browser, it won't.

Filtering

It's no coincidence that 99% of this form spam will be posting in a block of HTML with bullets, images and links, and that's by design. They want you to follow the link since the rest of the information provided is fake. However, do you really need contact forms to have html links in the message? No, generally not. So upon submission, search the message for anything that looks like html, or URL's, and if you find that, dump it. In PHP, assuming $message is the forms message then :

if (preg_match('/[<>]|&\S+;/', $message)) { // HTML Detected - dump the form etc. }

This simple and cost effective approach will eliminate the majority of form spam, but to get it all you need to go a step further;

You need to check for the presence of unicode (unless you're in a country where unicode is required), because unicode is often used to confuse and circumvent anti-spam techniques but using replacement cyrillic characters to bypass pattern matching, so we'll use a regex like '/[\x{80}-\x{10FFFF}]/u' to detect these. If we find them, dump it.

if (preg_match('/[\x{80}-\x{10FFFF}]/u', $message)) { // Unicode Detected - dump the form etc. }

Validation

When you know the majority of the 'fields' of the form are going to be junk, with only the message area having the 'payload' then validate the crap out of your fields. If it's a full name, make sure it has at least two words, if it's a telephone number then validate it fully for your target format. In this country, our phone numbers are 11 or 12 digits, starting with a 0 and there are complex rules after that, but form spammers will often just put in 555-555-555 and of course that fails. Limit field lengths, for a fullname, accept no more than 30 characters, and for an email no more than 60 but don't do it client-side (well do, but also do it server-side) because form spammers will try and stuff their 'payload' in any text field that it might fit in, if the form fields are stuffed, its spam and dump it.

Pattern Matching

Unless you're an internet or seo company, look for 'SEO' and 'see for yourself' and 'loose weight' and 'google' and if you find these spammy strings, dump it. Remember we've already checked for unicode so at this point we're dealing with just UTF-8 or ASCII so searching is as easy as creating an array of these phrases, looping through them and checking if they exist in the text or message fields. If you find them, yes you guess it, dump it!

$unwantedPhrases = ['coupon', 'see for yourself', 'loose weight'];

$pattern = '/\b(' . implode('|', array_map('preg_quote', $unwantedPhrases)) . ')\b/i';

if (preg_match($pattern, $message, $matches)) { // phrase found - dump the form etc. }

Creativity

I've seen a good few creative ideas over the years, so here's a few that I like...

Email or SMS Code

You can implement an 'email the code' feature, where after entering a valid email, a javascript function invokes an API which generates an email code sent to that email address, and that code will be needed in the 'enter the code we emailed to you' box. This is very effective and will cut out 99% of the spam, but of course it does introduce some inconvenience to the process. You could of course take a mobile number and initiate an SMS message just as easily, and achieve the same result, but now you're capturing mobile numbers and surprisingly not everyone has a mobile, and some of those who do aren't willing to share it.

Q&A

I've seen some very creative question and answer tests, asking for example who is the current head of the WHO, or how do you correctly spell the 70s pop group from sweeden BAAB, ABBA, ABAB, BABA, etc. This sort of testing is bypassable with modern AI workloads, but not easily, and the better approach would be to collect all the possible questions, and correct answers and then be able to just use the right one in the future, so you would need to counter that by only selecting one Q&A pair per hour or per day so that someone with a python script and puppeteer can't just rinse your form of Q&A's by refreshing the page.

I've seen far more complex questions on some select sites, often requiring some intelligence to answer, but with that you do run the risk of rejecting some legitimate contacts. Having said that, this was used by a recruitment site specifically to filter out the idiots so it's use case then really.

Fancy Captcha

Yes, pretty dice is another way to have a captcha that can confuse some captch image breakers, but only for a short time before they adapt. This one obfuscates the dots with other dots, and can wiggle the dots about within the dice to make it harder for a pattern match, but even so, this will not be 100%. (If you want to implement your own dice - contact us).

AI

I'd be remiss if I didn't suggest that AI based analysis of the forms content together with a carefully crafted prompt should be able to discern if it's spam or not, but not with 100% accuracy, so you'll need to ask it for a likelihood as a value between x and y and then make you own decisions on where to pitch the cutoff. It's not going to be cheap but if you have a lot of form spam, and a lot of forms then it's worth exploring.

We task our AI models with determining the best team for the first contact on our enquiry forms, and this it does well so it is possible to combine both functions into one, where the destiation team for spam is nowhere.

Considerations

I have to mention it because someone will point it out if I don't. If you've got a contact form on your website in the UK/EU then you need to take consent in the form of a checkbox. I know it sounds stupid, it is, but you have to give the person submitting the form the option to agree to your processing of the form data, which in most cases will contain a name, email maybe phone, so definitely personal data. Technically, you can't refuse to accept the form if they don't give consent, but that's another stupidity in the GDPR, and actually if they don't consent to you processing it, then you can't respond to it, obviously.

If you're going to use their personal information of anything above contacting them in response to their form, then you need additional consent, and you need to be specific; Do you consent to us contacting you in the future about our products and services? and so on.

Conclusion

Form Spam is a growing nuisance, and form spamming software is only getting better. Our contact forms implement All the above, and we get zero spam, but not everyone is technically capable of writing the code to make it happen. If you're using a CMS there may be plugins that can help, but none of them will be foolproof, and there are many online services already offering to protect your forms from spam, phishing and harsh words if you trust them.

If you've got a site with a form that's endlessly spammed and the volume is causing disruption then, reach out to the webdev team at GEN, and we may be able to help.

13 Votes

--- This content is not legal or financial advice & Solely the opinions of the author ---