Sunday, March 14, 2021

Spam Bot Honey Pot

This post is the basis of my third submission to Hacker Public Radio (http://www.hackerpublicradio.org). If you'd like to listen to the melodic sound of my voice please visit: http://hackerpublicradio.org/eps.php?id=3296

In this post, I will describe the method I chose to combat spam bots filling out my company's contact form. About 99% of the submissions we receive are spam, which makes filtering for valid messages painful. After some research into different methods, I decided to go with the honey pot method.

The honey pot method uses an extra text input field to lure the spam bot into filling it out. There are different suggestions for how to hide this extra field from valid users by using either javascript or CSS. With javascript, the honey pot section of the form is removed from the DOM when the page loads, hiding it from your users. The argument for this method is most bots don't implement javascript, so the honey pot field will not be hidden from them. I think that is a valid argument but I didn't want to include extra javascript in my page--so I went with the CSS method.

There are references at the end of the post to a couple of the articles I read on implementing the honey pot with either javascript or CSS. My take away was, one, don't use the CSS display property set to the value of none to take the input out of the DOM. Sufficiently smart enough bots may know to scan for this, especially if applied directly to the element. Also don't name your classes something obvious to your intent like "anti-spam-filter". My guess is the majority of the bots out there aren't that sophisticated, but I figured it couldn't hurt to follow those suggestions.

I was already using Bootsrap CSS for our site, so I decided to use Bootstrap's "sr-only" class. This class is used for elements that you only want visible to screen readers. It takes the element and uses a combination of absolute positioning, setting the size and width to 1 pixel, setting a negative left margin, and hiding content overflow to prevent the honey pot showing up visually. I figured if the bot was scanning CSS for classes or properties, this wouldn't trigger any warnings. It does bring up the issue of how to prevent impacting the experience of people using screen readers. I applied the aria-hidden attribute with a value of true to the label element surrounding the honey pot input field. "[this] removes that element and all of its children from the accessibility tree." So we now have the field hidden both visually in the browser and from assistive technolgies. Given the short end of the stick accesibility usually gets, I doubt there are any spam bots scanning for that ARIA attribute. For the minority of users who might be viewing with the classic lynx browser, I put 'For office use' as the label text before the honey pot, hoping this would get the message across without tipping off the bot to the intended purpose of the related input field.

The other main issue with this method is the value of the name attribute used for the input field. Some argue to use obfuscated values like "mmxxName" instead of "name", or "sxysPhone" for "phone". Apparently some bots will skip fields they don't recognize. By using more standard names for multiple honey pot fields, it easier to determine if it is a bot. The counter arguement to this naming scheme is about the user experience, by obfuscating the name, then browser's won't auto-fill the valid fields of the form. This also brings up the matter of not auto-filling the spam fields by the browser of your users. This is done by setting any of your honey pot input elements' "autocomplete" attributes to "off".

So far this spam filtering method is working nicely. I currently send any messages flagged as spam to a different email address with the subject prepended with the words "[Spam review]". Once I am confident there are not that many false positives, I will just skip sending flagged messages. The one issue I have experienced with this method is when using the tab key to move through the form. Since the input field is only visually hidden, it still receives focus as you tab through. If you happen to hit another key while still in the hidden field, it will get captured by the honey pot and then the submission will be flagged as spam.

I have created a sample form on my personal site. Please visit URL: http://www.horning.us/hpr/SpamBotHoneyPot.php to try it out. It is a simple PHP page using the GET method when submitting the form. Once you press the submit button you will see the form fields and their values, along with the result messages. I chose to use "URL" as the name for my honey pot input field. I use it on my example form, and I use it for my work form. For my work form, a URL is not something we ask to be submitted, and being a common field in forms, makes it very tempting for bots. In my example code, the CSS for hiding the hony pot section is from the minicss.org website's. Their "visibility-hidden" class is very similar to Bootstrap's "sr-only" class. I would be interested to hear if other's have implemented something similar. I would also love to hear from someone who uses a screen reader. Does it prevent the honey pot section from being read?

References