administration mode
Pssst...Ferdy is the creator of JungleDragon, an awesome wildlife community. Visit JungleDragon

 

Article: Building your own Captcha »

FERDY CHRISTANT - DEC 7, 2008 (03:09:32 PM)

A common sight on web forms nowadays is the Captcha check, a way to determine whether your are a human or bot user. Here is an example of a Captcha check, as taken from Digg.com:

Surely no user is having fun filling these out, right? Yet, it seems to be a neccessary evil in our fight against spammers. With the lack of a better alternative, we should use Captchas. However, who is to say that we should present a Captcha as the example above? As a hurdle for the user?

In this article we will get creative in building our own Captcha. But, why would we do such a thing?

  • Because a custom Captcha can fit exactly into the design and theme of your site. It will not look like some alient element that does not belong there.
  • We want to take away the perception of a Captcha as an annoyance, and make it fun for the user.
  • Because a custom Captcha, unlike the major Captcha mechanisms, obscure you as a target for spammers. Spammers have little interest in cracking a niche Captcha.
  • Because we want to learn how they work, so it is best to build one ourselves.

Note that this article is a supplemental article to my previous one: Advanced jQuery form validation. Both articles combined enable for some quite robust web forms. While we're at it, you may also want to have a look at my Building Unicode LAMP applications article.

With that out of the way, let's get started. First, we'll have a look at how Captcha's work.

Captcha Logic

I've dived into quite a number of open source Captcha scripts to see how they work. Luckily, the patterns used are quite simple, and easy to understand. Here are the steps:

  1. The Captcha image (or question) is generated. There are different ways to do this. The classic approach is to generate some random text, apply some random effects to it and convert it into an image.
  2. Step 2 is not really sequential. During step 1, the original text (pre-altered) is persisted somewhere, as this is the correct answer to the question. There are different ways to persist the answer, as a server-side session variable, cookie, file, or database entry.
  3. The generated Captcha is presented to the user, who is prompted to answer it.
  4. The back-end script checks the answer supplied by the user by comparing it with the persisted (correct) answer. If the value is empty or incorrect, we go back to step 1: a new Captcha is generated. Users should not never get a second shot at answering the same Captcha.
  5. If the answer supplied by the user is correct, the form post is successful and processing can continue. If applicable, the generated Captcha image is deleted.

Note: Some Captcha mechanisms have extra features to increase the accessibility of the check, for example a way to play the Captcha as a sound. For this article, we will stick to the basic approach.

Introducing the JungleDragon Captcha check

Throughout the remainder of this article, I will use the JungleDragon Captcha as an example for building a custom Captcha. JungleDragon is a wildlife image sharing site. A first step in designing a custom Captcha is to think about your user base in order to find a way to blend a Captcha check into the user experience. For JungleDragon, it has resulted in this:

Users are presented an image of an animal and then have to guess what it is. When they fail, a new random image is pulled up, as well as a fresh set of answers, their order and options shifted each time.I'm not saying users will have the time of their lives filling out this check. However, it does fit into the theme of the site much better, and it is quite userfriendly, don't you think?

How it works

Part of me does not want to disclose how it work, really. It may inspire the wrong kind of people. Ah well, it's just a learning exercise, so let us continue...

This particular Captcha works by randomly selecting an image. To start with, I collected a number of animal images and cropped them all to the same size. The more images, the better. Next, I placed these into a Captcha image directory. Finally, I sequentially numbered the files. That was easy. On to the coding.

I implemented my Captcha back-end into a PHP class, so I will use that throughout the rest of the article. The logic can apply to any platform though. First, we need to have a way to map the images to the answers, otherwise there is no way for us to know which animal is on which image. Inside my Captcha class, I capture these relations into an associative array (as a class property):

 var $images = array(
    1=>"parrot",
    2=>"panda",
    3=>"lion",
    4=>"snake",
    5=>"gorilla",
    6=>"turtle",
    7=>"elephant",
    8=>"pinguin",
    9=>"alligator",
    10=>"octopus"
    );

As stated before, the more images you have, the better. For now, let's stick to 10. At the heart of the class is the method that generates the Captcha, here it is:

   1:  function generate_captcha($num_answers)
   2:  {
   3:    // get random image
   4:    $image_num = rand(1,sizeof($this->images));
   5:    $image_name = $this->images[$image_num];
   6:          
   7:    // set the correct answer in the session
   8:    $this->CI->session->set_userdata('captcha', $image_name);
   9:          
  10:    // build up list of possible answers
  11:    // we'll start by including the correct answer
  12:    $answers = array();
  13:    $answers[] = $image_name;
  14:          
  15:    // next, we need to find num_answers - 1 additional options
  16:    $count = 0;
  17:    while ($count < ($num_answers-1)) {
  18:      $currentanswer = rand(1,sizeof($this->images));
  19:      if (!in_array($this->images[$currentanswer],$answers)) {
  20:        $answers[] = $this->images[$currentanswer];
  21:        $count++;
  22:      }
  23:    }
  24:          
  25:    // shuffle the array so that the first answer is not
  26:    // always the right answer
  27:     shuffle($answers);
  28:          
  29:    // build data array and return it
  30:    $data = array(
  31:      "image_num" => $image_num,
  32:      "image_name" => $image_name,
  33:      "answers" => $answers
  34:    );
  35:          
  36:    return $data;
  37:  }

Now, let us walk through the relevant lines of code from this method:

1. The method signature. Note how we can pass in $num_answer, to indicate how many possible answers are showed for each image.


4. randomly select an image number based on the options in the associative array discussed earlier.


5. get the name that corresponds with the randomly selected image number from line 4.


8. This is an important step. Here we are persisting the correct answer (image name) of the currently generated Captcha. We need to persist this securely. In my case, I'm using encrypted cookies, but you can also use server-side session variables, a file, or a database.

12. With the image selected and the correct answer persisted, we now need to generate a set of possible answers. We'll store them in the $answers array.

13. Are set of options always has to contain the correct answer, so we'll include that in the array
16-23. Next, we will generate the additional answers, which are all wrong. We'll keep looping until we have found the number of unique answers requested by $num_answers minus 1, since we already included one answer: the correct one.
27. We do not want the correct answer to be at the same position in the answer list, therefore we shuffle the answer list.
30-36. Here we are building up an array of values that the calling code needs to work with the Captcha, and then return it.
While we're still in this class, let's finish it. There is only one additional method: check_captcha. This method checks if the answer that is passed to it corresponds to the persisted answer:

function check_captcha($answer) 
{
  // check if captcha is correc
  return ($this->CI->session->userdata('captcha') === $answer) ? true : false;
}

That's it. We can now start using this class. From our script that renders our front-end pages, we call:

// generate a new captcha
$this->load->library('captcha');
$captchadata = $this->captcha->generate_captcha(5);

Note that this syntax of class loading and calling the method is specific for the CodeIgniter PHP framework. You can use the classic PHP syntax if you do not use this framework.

With $captchadata in our pocket, we can then assign it to our presentation layer, which will render it:

<?
foreach ($answers as $answer) 
echo "<input type=\"radio\" name=\"captcha_answer\"
 id=\"$answer\" value=\"$answer\" />\n
<label for=\"$answer\">$answer</label><br/>\n";
?>

Finally, in our postback code, we will call the check_captcha to see if the user has entered the correct answer based on the field value of captcha_answer. It depends on the validation library you use how to call it, just make sure that a new Captcha is generated if the answer was empty or incorrect!

Note: We have ignored validation messages in this article. We will want to tell the user when he fails to answer the correct question. My previous article, Advanced jQuery form validation, explains how to do this effectively.

Spot the flaw!

This concludes the explanation of the JungleDragon Captcha mechanism. Careful readers and security paranoids may have spotted two flaws:

  • We are persisting the correct answer in a cookie. Although it is encrypted, a server-side approach is considered more secure.
  • We are not transforming our images. Although randomly selected, each individual image looks the same each time. This allows spammers to apply a unique hash to each image and then check the captcha image with their list of hashes.

Both are good points, and I invite everyone to harden their custom Captcha mechanism. A third way to make it harder for Captcha abuse is to check the HTTP REFERER. A fourth one is to add or change the images regularly.

I do want to make a statement about the above flaws: they're not as bad as they seem. Beating spammers is not a hard science, you need to have multiple layers of defense. What you see above is in fact the most successful strategy against spammers: obscurity. Simply because we have a CUSTOM Captcha, makes us less of a target. Spammers go for mass targets, as their success rate is typically extremely low. Even if we would have a single animal image with just two answers, and the answers would be in the same order each time, we would drastically reduce spam submissions.

An example of this effect is the site Coding Horror, it has a custom Captcha check that simply lets you enter the word "Orange" to post a comment. Orange. Each time. Nothing is random, and there is only one answer. Still, it has drastically reduced the spam on that blog, and it's a big blog. Obscurity works. Not for security, but against spammers.

Conclusion

With the security stuff out of the way, back to our original goal: providing a custom Captcha check for your users that is fun, and fits the design and theme of your web application. With a bit of creativity and the help of this article, it's easy!

Share |

Comments: 3
Reviews: 1
Average rating: rating
Highest rating: 4
Lowest rating: 4

COMMENT: TOM JANSEN rating

DEC 10, 2008 - 11:12:58

comment » Hi Ferdy,

Nice going! I like the look and feel of your custom captcha, indeed this allows for much more "blending into" the design of your site.

Maybe an additional "flaw" of this strategy is that by posting the same radiobutton selection 5 times in a row a spammer has a good change of answering the captcha correctly (5 options = 20% chance of correct answer).

Adding more options is an alternative, but that will start to gobble up quite some UI real-estate so that might not be very attractive.

However, like you pointed out with the "Orange" example, it's not very likely to happen, but hey, Jungle Dragon is going to be a lot of math with karma/credits so maybe this is something to keep aware of in the back of your mind.

Greetz, Tom. «

COMMENT: FERDY

DEC 11, 2008 - 08:16:20 AM

comment » Tom,

Thanks. Good point. Indeed I think that letting users enter "orange" is more secure than allowing them to chose from a set of predefined options. Will think about how to resolve this. «

COMMENT: VIC

MAY 19, 2011 - 09:56:35

comment » Info in your blog help me with my project, whitch based on букмекеры and your work is great! «

RATE THIS CONTENT (OPTIONAL)
Was this document useful to you?
 
rating Awesome
rating Good
rating Average
rating Poor
rating Useless
CREATE A NEW COMMENT
required field
required field HTML is not allowed. Hyperlinks will automatically be converted.