Evaluating spam score for a single page application

Question

I am developing a single page application with a behaviour similar to google Maps and I want to avoid spam as much as possible.
Users can register on my website to get access to an API KEY. Then, on their website, they can implement this single page application like this :

<script src="https://www.mywebsite.com/single_page_app.js?key=THEIR_API_KEY"></script>

(Plus a few line of javascript to bind the app to a <div> but that's not relevant)

This single page application will be used by end-users to fill informations (options, date, email, names...) through a multi-step form. Each end-user has a session (cookie are used with CORS request) and a token is sent to the application to avoid CSRF. At the end of each step, the collected data is sent to my server with the token and saved into my database.

How can I evaluate if the data sent to my server is spam ?

I've got a few (obvious) ideas which are :

Validate the data sent
Analyse the time between steps and the total time spent
Count the number of request by IP
Look for request with the exact same data (md5 hash)
Look for the Origin and Referrer Header (CSRF, not really spam)

Is there anything more I can do ?

Moreover, some verifications (like the number of request per IP or the duplicate data) can be time consumming as I have to look through my database.
Should these verifications be made by a CRON process after being inserted into the database or on-the-fly before inserting them into my database ?

What should I do with SPAM data ?

SPAM data can be usefull to detect futur spam, should I delete it or keep it somewhere ? For how long ?

Adam Katz · Answer 1 · 2018-01-04T16:21:09.843

What happens to your database? Think from an attacker's perspective and look for ways to abuse the system.

There are three methods a form/API can be used to spam:

The submitted data shows up on a website
The submitted data is sent via email (usually to a small and fixed set of people)
There is a vulnerability in the implementation that can be exploited to use your site to relay spam in another manner (review your code! this method is not otherwise discussed here)

If the data is shown on a website (or if an attacker makes that assumption), you're likely to see comment spam aspiring to:

Perform search engine optimization
Spread malware (by link and/or attachment)
Damage your site's credibility or otherwise provide noise
Troll you or users of your site
Spread fake news or other propaganda (essentially #4 and #1)

You can combat form→email spam and web comment spam as follows:

Use blacklists like URI DNSBLs to combat link spam
(Web only) Render links with nofollow, like <a href="…" rel="nofollow">this</a>
Do not allow attaching non-media files
Rate limit: “Too many submissions from this IP, try again later”
Limit privileges for newcomers, e.g.
- “You can't attach files until …”
- “You can't provide links until …”, then use nofollow, then allow fully
- “You can't post images until …”
- “You can't post more than once an hour until …”
Have a moderation team review every post before it goes public
- (Web only) Crowdsource it! Consider a user moderation system like Discourse
Build a machine learning algorithm to automate the human moderation team
Implement a captcha (and make it harder for newcomers than for regular users)

The less visible "success" is for an attacker, the less persistent attacks will be (though some attackers don't care). Even claiming to have moderator review but merely delaying a new user's posts by a few hours may thwart some spam.

If you're sending email, you can use one of many Bayesian systems internally to see whether you want to drop the submission from your database and/or from pushing externally. ( You'll still have to regularly train this system.) You may also want to use an Email Service Provider (ESP) to handle your outbound reputation for you.

You can report spam images to Knujon ("No Junk" spelled backwards) in its image-only file upload. You can report phishing landing sites to PhishTank.

Evaluating spam score for a single page application

1 Answers1