← Back to all posts

Traffic Defense

Inside a Click Fraud Detection System

14 min

How clickfraud.ru uses machine learning and behavioral analysis to separate real clicks from bot traffic. Architecture, signals, and the ongoing arms race against fraudsters.

Click fraud detection is an adversarial problem. Every detection signal you publish becomes a signal fraudsters can avoid. This makes writing about architecture somewhat delicate — but the broad approach is worth explaining, because it helps advertisers understand what they're actually buying.

What we detect

Invalid clicks fall into several categories with different characteristics:

  • Competitor clicks — manual or semi-automated clicking from competitors to exhaust advertising budgets. Detectable through IP clustering, timing patterns, and session behavior.
  • Click farms — coordinated manual clicking from farms of real devices. Harder to detect with IP rules alone; requires behavioral fingerprinting.
  • Automated bots — scripts executing clicks without human interaction. Detectable through browser environment signals, JavaScript execution patterns, and absence of natural mouse/touch behavior.
  • Motivated click fraud — publishers on CPA networks generating clicks on their own placements. Requires cross-session pattern analysis.

The signal stack

Our system evaluates each click across three layers:

1. Network signals

IP reputation, ASN analysis, datacenter and VPN detection, geographic inconsistency, and proxy identification. These are fast and cheap to compute but increasingly easy to spoof.

2. Behavioral signals

Session duration, scroll behavior, cursor movement patterns, click velocity, time-on-page, and interaction with page elements. Bots that execute JavaScript can fake many of these, but not all of them simultaneously at scale without observable patterns.

3. Pattern signals

Cross-session analysis: does this device or fingerprint appear across multiple advertiser campaigns? Does the click timing cluster in ways inconsistent with organic search behavior? These signals require historical data and become more accurate over time.

The ML component

The neural network layer was funded by the state grant and deployed in 2020. It consumes the output of the signal stack as features and produces a fraud probability score for each click. The model is retrained periodically on labeled data from our analyst team.

One important design decision: we don't block in real-time. We analyze and report. Advertisers review reports and request credits or chargebacks from platforms. This is slower than real-time blocking, but it avoids false positives that could block real customers — which would be far worse for most advertisers than the fraud itself.

The arms race

Fraudsters respond to detection signals. When we improved IP detection, click farms moved to residential proxies. When we improved JavaScript behavior detection, sophisticated bots started simulating more realistic mouse movements. This is an ongoing process, not a solved problem.

The realistic goal isn't 100% detection — it's making fraud expensive enough that the economics stop working for the fraudster. When your detection rate is high enough that a click farm needs to spend more on infrastructure to avoid detection than it earns from the fraud, the fraud stops.

Detection is not a one-time engineering achievement. It's an operational discipline — the system needs to be monitored, updated, and improved continuously as attack patterns evolve.

— internal architecture review, 2021

Maxim Kulgin

Maxim Kulgin

Saint Petersburg · bezsmuzi channel

About the author