2

I am developing a single page application with a behaviour similar to google Maps and I want to avoid spam as much as possible.
Users can register on my website to get access to an API KEY. Then, on their website, they can implement this single page application like this :

<script src="https://www.mywebsite.com/single_page_app.js?key=THEIR_API_KEY"></script>

(Plus a few line of javascript to bind the app to a <div> but that's not relevant)

This single page application will be used by end-users to fill informations (options, date, email, names...) through a multi-step form. Each end-user has a session (cookie are used with CORS request) and a token is sent to the application to avoid CSRF. At the end of each step, the collected data is sent to my server with the token and saved into my database.

How can I evaluate if the data sent to my server is spam ?

I've got a few (obvious) ideas which are :

  • Validate the data sent
  • Analyse the time between steps and the total time spent
  • Count the number of request by IP
  • Look for request with the exact same data (md5 hash)
  • Look for the Origin and Referrer Header (CSRF, not really spam)

Is there anything more I can do ?

Moreover, some verifications (like the number of request per IP or the duplicate data) can be time consumming as I have to look through my database.
Should these verifications be made by a CRON process after being inserted into the database or on-the-fly before inserting them into my database ?

What should I do with SPAM data ?

SPAM data can be usefull to detect futur spam, should I delete it or keep it somewhere ? For how long ?

Gary Olsson
  • 121
  • 4

1 Answers1

1

What happens to your database? Think from an attacker's perspective and look for ways to abuse the system.

There are three methods a form/API can be used to spam:

  1. The submitted data shows up on a website
  2. The submitted data is sent via email (usually to a small and fixed set of people)
  3. There is a vulnerability in the implementation that can be exploited to use your site to relay spam in another manner (review your code! this method is not otherwise discussed here)

If the data is shown on a website (or if an attacker makes that assumption), you're likely to see comment spam aspiring to:

  1. Perform search engine optimization
  2. Spread malware (by link and/or attachment)
  3. Damage your site's credibility or otherwise provide noise
  4. Troll you or users of your site
  5. Spread fake news or other propaganda (essentially #4 and #1)

You can combat form→email spam and web comment spam as follows:

  • Use blacklists like URI DNSBLs to combat link spam
  • (Web only) Render links with nofollow, like <a href="…" rel="nofollow">this</a>
  • Do not allow attaching non-media files
  • Rate limit: “Too many submissions from this IP, try again later”
  • Limit privileges for newcomers, e.g.
    • “You can't attach files until …”
    • “You can't provide links until …”, then use nofollow, then allow fully
    • “You can't post images until …”
    • “You can't post more than once an hour until …”
  • Have a moderation team review every post before it goes public
    • (Web only) Crowdsource it! Consider a user moderation system like Discourse
  • Build a machine learning algorithm to automate the human moderation team
  • Implement a captcha (and make it harder for newcomers than for regular users)

The less visible "success" is for an attacker, the less persistent attacks will be (though some attackers don't care). Even claiming to have moderator review but merely delaying a new user's posts by a few hours may thwart some spam.

If you're sending email, you can use one of many Bayesian systems internally to see whether you want to drop the submission from your database and/or from pushing externally. (⚠ You'll still have to regularly train this system.) You may also want to use an Email Service Provider (ESP) to handle your outbound reputation for you.

You can report spam images to Knujon ("No Junk" spelled backwards) in its image-only file upload. You can report phishing landing sites to PhishTank.

Adam Katz
  • 9,718
  • 2
  • 22
  • 44