« Previous entry | Next entry » Browse > Snippets

Skip to comments (13) [PHP]Anti-Spam techniques
Posted by Eghie on Aug 26 2006 @ 14:59  :: 5634 unique visits

There's a lot of SPAM lately. You could build a couple of anti spam functions in your software, without extra user interaction.

Unique submit key
You could generate a random key, with a couple of user specific variables, when the page loades and check the next time, when the user submits, if the key is correctly submitted. I've given a quick scan through your code and I didn't see any. I generate also a random name for the key, so the spammer needs some advanced way for scanning the code. Could also be a simple way, but in combination with the next items, it should be hard. Here is some example code written in PHP:
CODE: PHP
// Somewhere in the backend or something
function generateSubmitKey()
{
    // Generate key
    $_SESSION["submit_key"] = md5(rand() . $_SERVER["REMOTE_ADDR"] . rand() . date("dmYHis") . microtime());

    // Generate random name for using with submit button name, so bot's can't be easily be reading a standard name for it.
    // I'm also trying to make it regexp scan proof.
    $_SESSION["submit_rand_name"] = md5(rand() . 'submit_button' . rand() . microtime());
}

function checkSubmitKey()
{
    if (!isset($_SESSION["submit_key"]))
    {
        // We don't use the key so return always true
        return true;
    }

    if (!isset($_POST[$_SESSION["submit_rand_name"]]))
    {
        // The value doesn't exists, but the key does
        // This seems like a spam post to me
        return false;
    }

    if ($_POST[$_SESSION["submit_rand_name"]] == $_SESSION["submit_key"])
    {
        return true;
    }

    // Shouln't come here, if it does, the key isn't correct
    return false;
}


// In the form page
generateSubmitKey();
echo '<input type="hidden" id="submitkey" name="' . htmlentities($_SESSION["submit_rand_name"]) . '" value="' . htmlentities($_SESSION["submit_key"]) . '" />';


Unique submit names for content
In the former item, I generated a random key, but also a random name for the key item. Spammer wouldn't get to find it trough code scanning though (trough regexp scanning or something similar). You can, but then you must count HTML elements or line numbers, but to kill that way, you must read my next item. For spammers to post spam, they must need to find the content input field, where they can insert their links and stuff. If you also would generate a random name for your content field, it's a lot harder to spam for bots. You could use something like the following code:
CODE: PHP
// In the backend
function getInput()
{
    if (isset($_POST[$_SESSION["content_rand_name"]]) && !empty($_POST[$_SESSION["content_rand_name"]]))
    {
        $content = $_POST[$_SESSION["content_rand_name"]];
       
        // do something with the content
    }
}

// In the form page
$_SESSION["content_rand_name"] = md5(rand() . "content" . microtime() . rand());

echo '<textarea id="' . htmlentities($_SESSION["content_rand_name"]) . '" name="' . htmlentities($_SESSION["content_rand_name"]) . '"></textarea>';


Switch form elements from place
To secure the forms with the former code, they still could spam, although it would be a lot harder for dumb spammers. They can count the HTML elements or line numbers to tackle their problem. So we need to tackle that way also. If your using only CSS for layout and use DIV's and SPAN's to place the content in the right layout, it's easier to do then you use tables or something similar. Because it doesn't matter where you put it in the code (it does matter actually, but it's not very strict and it depends of your code). You could switch the place of the div's in your HTML (not by layout, but by place in the code). I don't have the time to give you example code for this one though. It's a lot of work, but when it work, it will be a lot harder for some spammers.

Don't use to much elements to cover a input field
A problem which is difficult to tackle for some sites is, spammers can find input elements by surrounding HTML elements or using it's own one, altough the name is random. For example, see the following code:
CODE: HTML
<span class="content">
Content:<br>
<textarea id="874898HJDHN#$#7383NB" name="874898HJDHN#$#7383NB"></textarea>
</span>

Altough the name and id are "random", the element can easily be found, because of the span with the class content. There is only 1 textarea in it. Well, how much work is it to find that one? Not much. Also if you have something like the following:
CODE: HTML
<form method="post" action="/posts/insert">
    Name: <input type="text" name="3848937HEUHE783" size="10">
    Password: <input type="password" name="384FD##$22FE783" size="10">
    Content: <textarea id="874898HJDHN#$#7383NB" name="874898HJDHN#$#7383NB"></textarea>
</form>

Well, how much work is it to find the only textarea in the HTML. A sollution for it is create a gambling game, maybe with the use of Javascript. Create a lot of textarea's, with 1 which is being used for content. The others are for confusing the spammer. Do this randomisation every time when the page loads, because then it's kind of unpredictable which textarea it is. With a bit of skills in Javascript and CSS you can create it, without the normal user noticing it.

Use Javascript to set a extra field
Normally spammers use a self written piece of software. They don't use a browser for it. And the software they written can't execute Javascript. So we can use this in our advantage. Execute on submit some form of Javascript which sets a kind of key in a hidden field which is randomly generated in Javascript. Don't let the spammer know your algoritm, so he can recreate this algoritm in his software, therefore you could use Javascript icm with some kind of AJAX technology and let PHP generate the key (for more info how to use AJAX see: http://www.codepost.org/browse/snippets/59 (that last post was also from me ;) )).

Use a kind of URL/IP blacklist
To block the most of spams, block all the posts with URL's written in it that are blacklisted. You can get some blacklists from http://www.bluetack.co.uk. Also if the poster his IP address is blacklisted, block him. Combine this function with a feature which uses whois to find the abuse address of it's provider and make a report button in the backend which sends an e-mail to the abuse address with a standard mail and with some info of the user who posted it and date and time. Don't do it automaticly, you could maybe after 10 spam posts of the same user, but prefer it you trigger the report yourself in the backend with some of your own comments to it. So users wo aren't spamming but are detected by the spam "filter", by a bug or something, don't have anything to worry.

Use Captcha's (Optional)
Try using Captcha's. Not anyone wants it, because it needs user interaction, but if your users don't mind it, do it. You have 3 kinds of known Captcha's as we speak, audio captcha's, code type captcha's, quiz kind captcha's. I recommend using quiz kind of captcha's. Those are captcha's with a kind of puzzle or some math (like 1+1). The user needs to think to give the correct answer. Altough this type of spam filtering is not very stable. Some users aren't good at math or some are writing the results different then others, so you need to find a user friendly way to implent those captcha's. For more information on generating Captcha's in PHP, see the following url: http://www.codepost.org/view/124.

Don't use plaintext Mail addresses
This item is not about SPAM on website, but on a website it needs attention though. It's been well known that writing e-mail addresses in full in webpages (like: john@doe.com), can trigger SPAM bots to find your e-mail. Some people think they can confuse the bots and use things john[at]doe.com, or john [at] doe.com, or john [apenstaartje] doe.com (apenstaartje == full name of @ sign in Dutch). They think that SPAM bots can't detect those as mail addresses. But they are wrong. If the spammers are good programmers they can easily add some extra filters for common ways to enter the e-mail address. Also some software like Plesk for example use Javascript to write the mail address, maybe encoded with entity's, but how hard is it to decode those? It seems not very hard to me. Also some site uses pictures where the email address is written on it, but if those pictures are very plain, it can be decoded by trying to detect the letters on the picture, see the following url for more information: http://www.codepost.org/view/134. Altough that code can't decode anything, you could change the code that matches your font.
A solution for it would be a e-mail form, which mail's the user and the user can decide to respond to that mail and give his mail address to the one who sended the mail. Also use the above anti-spam techniques in your mail form, it would give you less spam.

License
This text is distributed under the Creative Commons License Attribution 2.5. See following URL for more information: http://creativecommons.org/licenses/by/2.5/legalcode (short version: http://creativecommons.org/licenses/by/2.5/).

Greeting,


Michiel Eghuizen (AKA: Eghie)

13 comments posted so far
Add your own »

1. On Aug 28 2006 @ 04:54 Matthijs wrote:

nice post! :) I think almost all the spam here is submitted using a crawler so  generating the form using javascript (perhaps in combination with the randomly added key) will probably suffice.

2. On Aug 28 2006 @ 15:53 Eghie wrote:

Indeed, that will be enough for now for this software, I think. But when a spammer knows his spamming engine is not going to work anymore on the most sites, he will trying to get around the most techniques which are blocking his ways to spam. By using a combination of several good techniques he will have a hard time to spam your site with his bot.

3. On Aug 30 2006 @ 14:11 Eghie wrote:

You can also use the Askimet service.

It is a kind of blacklisting service, which checks your message in a central place against a couple of rules.

4. On Aug 30 2006 @ 14:22 Eghie wrote:

You could also use a blacklist like Sorbs for detecting Open Proxy's and stuff.

Also good way, to deny spam, is to have a preview page before you submit, but that would require an extra user interaction.

5. On Feb 07 2008 @ 20:25 guest wrote:

Just have a list of passwords they need to enter. And to get the password they have to explain whats coming down the road in a pic. A ford, a chevy or landrover.......... all logos would be visible on the front of the cars.

xD

6. On Nov 03 2008 @ 04:48 aya wrote:

AltaVoice Communications

7. On Jan 22 2009 @ 10:46 asd wrote:

As Israel’s tanks and troops poured into Gaza on Saturday, <a href="http://www.dofusmax.com" title="acheter des kamas">acheter des kamas</a>the next phase in its fierce attempt to end rocket attacks, <a href="http://www.dofusmax.com" title="dofus kamas">dofus kamas</a>a question hung over the operation:<a href="http://www.dofusvault.com" title="acheter des kamas">acheter des kamas</a> can the rockets really be stopped for any length of time while a political analyst, <a href="http://www.dofusvault.com" title="dofus kamas">dofus kamas</a>in the newspaper Haaretz on Friday. “No matter what you call it,” he added, “Hamas will obtain legitimacy.”

8. On Jan 22 2009 @ 10:47 asd wrote:

tanks and troops poured intoacheter des kamas Gaza on Saturday,dofus kamas the next phase in its fierce attempt to end rocket attacks, a question hung over the operation: acheter des kamascan the rockets really be stopped for any length of time while a political analyst,dofus kamas in the newspaper Haaretz on Friday. “No matter what you call it,” he added, “Hamas will obtain legitimacy.”

9. On Apr 13 2009 @ 14:28 Steve wrote:

Anti spam techniques are rare to use but as spamming is growing and we need to use some really strong anti spam codes to prevent them attacking on our system i like your struggle about providing these codes they must be highly strong and can prevent spams i must use them after my 70-290 managing and maintaining a Microsoft windows server 2003 environment exam about which i am confident to pas in first attempt as i have already pass my 70-640 windows server 2008 active directory exam with high score along with the 70-649 Upgrading MCSE on wndws serv 2003 to wndws serv 2008 exam in first attempt due to highly prepared and with 100% guarantee to pass material as i will be free from all this i must return on your page as you have done a great job and this code cold do lots to stop spams.

11. On Jul 14 2009 @ 04:04 guest wrote:

buy wow gold
my wow power leveling
buy wow gold
good wow power leveling
BUY wow gold
my wow power leveling
CHEAP rs gold
cheap wow power leveling
CHEAPEST lotro gold
MY aion gold
buy wow gold
cheap wow gold
CHEAPEST wow gold

13. On Jan 05 2010 @ 14:44 uggbaileybutton wrote:

bailey button uggs

-ugg boots cheap

ugg boots uk

ugg classic

Add a new comment

Name:
Password: (leave empty for anonymous comment)
 
View formatting tags Comment: