Website Security: Sorting the Humans From the Robots

Robin Christopherson | 12 Mar 2015

Knowing your webforms are secure from the thousands of unwanted spambots trying to create fake accounts or inundate you with advertising for questionable medical supplements is something that gives most website owners peace of mind. Over the years different security and anti-spam systems have been developed to determine who is actually trying to send you a message, subscribe to your newsletter or buy one of your products.

One of the most commonly used (I deliberately refrain from saying popular) authentication systems is CAPTCHA (‘Completely Automated Process that can tell Computers and Humans Apart’) – users have to decode scrambled or distorted text to prove they are a human being rather than a malicious spam robot.

A Web Accessibility 'Catch-22'

example of a standard CATCHA test using distorted text With their appearance CAPTCHAs presented many disabled users with a variety of different problems – but almost always intractable, meaning that they were unable to use the form or complete the process. This is because CAPTCHAs are an accessibility ‘Catch-22’ in which the various access technologies used by disabled people that require content to be machine-readable and then converted into their own preferred output format (such as text to speech) are needing to do exactly what CAPTCHAs are designed to thwart.

If the unlabelled image of the distorted code had a text label (as images should) then the robot would simply use that label, and yet without it blind users are stumped. If the text in the image could be recognised by text-recognition software then a blind user could use it but so could a robot. These distorted codes also present huge problems for users with low vision or dyslexia.

In many cases there is an option to have the text read aloud, but this garbled code (made difficult to hear so that voice-recognition software can’t recognise it) means that human ears can’t distinguish it either. As a blind person I have never successfully understood an audio CAPTCHA and it is blind users like me that they are there for. Of course you can reset the CAPTCHA if you can’t read or understand it, but what it is replaced with is equally as distorted or garbled so the user is no better off.

There are better alternatives that don’t catch the user in this accessibility armlock. Some CAPTCHAs use simple mathematical equations or logic questions to find out whether you’re a human or not. For example; ‘Which of the following is not an animal: Dog, Elephant, Hot, Mouse, Monkey’.

Whilst vastly preferable to the usual CAPTCHA because it is now machine-readable and, surprisingly, still too difficult to be understood by robots, this system is still not perfect and can present a barrier to access for some disabled people with very significant cognitive difficulties. There is, however, an argument that says that people with that level of cognitive difficulty may well not be completing webforms without assistance anyway and hence someone will be on hand to help.

"no CAPTCHA reCAPTCHA" has Google solved the problem?

At the end of last year Google launched a new alternative to the widely despised problem. Called “no CAPTCHA reCAPTCHA” it seems too simple to be true. In keeping with Google’s minimalist style and philosophy, they have reduced the problem of website security and spam attacks down to one simple question and tickbox; “are you a robot?”

Behind this simple question and stylish checkbox is a rather sophisticated bit of funtionality. Google’s own research identified that Artificial Intelligence technology was now smart enough to be 99.8% able to decode even the most distorted CAPTCHA texts, so they began to look at ways to improve the system.

Google’s developers created Advanced Risk Analysis functionality that looks at the behaviour of the user (humans and robots) before, during and after the CAPTCHA interaction, so that there is no longer the need to rely so much on the use of distorted text. This led on to the launch of the new ‘No CAPTCHA reCATCHA’ API.

Users have to simply check the box to say “I’m not a robot”. That’s it.

Just to err on the safe side and in case Artificial Intelligence in the form of a spambot can emulate human behaviour, there are additional security layers to the new API. For example, if the risk analysis algorythms can’t be certain whether a user is a robot or human then the good (or bad) old CAPTCHA image will appear.

We were sceptical as to whether the process would work for keyboard users (that’s blind users, those with motor difficulties that find mice difficult to use and also smart TV users whose remote control is in effect tabbing through the page) and to our surprise it did seem to. There is obviously some way that a human tabs through a web page that is distinguishable from a robot. It does not, however, work on mobiles where there is no behaviour to track on a page except the occasional tap on a form field or button.

Whether or not Google’s new tool can distinguish between a spam robot and a wide range of human beings using an array of assistive technologies remains to be seen. There are only a handful of websites actually using the new process and it will be interesting to watch how this new tool develops and whether or not it is finally the answer to the perennial conundrum that is CAPTCHA.