One of the many cultural shifts to come out of becoming a connected, always-online world is that consumers have a loud voice that has a substantial impact on sales. For example, when you buy an Anker product on Amazon, you’ll get a little friendly card asking you to leave a review:
You similarly see requests in mobile games to give them a 5-star rating (and if not, they cleverly direct you to their support page instead of the app store rating page):
User ratings matter and we, as customers, tend to trust our fellow consumers. I don’t know anything about lemon squeezers, but if I ever needed one I wouldn’t have a single hesitation buying this one:
(I actually did end up buying that one. It’s great of course.)
But once you give power to the people, you find out that a few people are jerks that can ruin everyone’s fun:
It’s important enough that Amazon filed a lawsuit against a company for providing fake ratings. “While small in number, these reviews threaten to undermine the trust that customers, and the vast majority of sellers and manufacturers, place in Amazon, thereby tarnishing Amazon’s brand,” the suit says. “Amazon strictly prohibits any attempt to manipulate customer reviews and actively polices its website to remove false, misleading, and inauthentic reviews.”
With that introduction, let me tell you a story about Kongregate.com...
We have a long history of players helping surface great content by rating games on a scale of 1 to 5 stars. We also use that data to present recommended games based on ratings from similar players (using the commendo service). This system has worked great for years, but around May of 2016 things got a little...fraudy.
We admittedly didn’t pick up on it right away. Some games seemed to be struggling in ways that we didn’t expect, but foul play hadn’t occurred to us. A few months later, though, we were getting reports from developers that something was amiss. We started investigating and, oh boy, was there something wrong.
As this recent Gamasutra article points out, there is a coordinated attack being carried out regularly on Kongregate games. The motivation is pretty clear: they are using thousands of accounts to rate some games down and others up in an effort to get their games to win the weekly and monthly game contest prizes. These prizes (totaling around $7500 / month) are determined solely by the user rating, meaning these fraudsters had tremendous control over the results.
Around this time we had some developers reach out to us to let us know about a Russian-language thread they had discovered that confirmed the patterns we were seeing: someone had offered services to developers to force their game to the top of the ratings.
We stayed quiet about it for quite a while so we could continue to collect data and behavior patterns. We developed a series of algorithms to detect this type of behavior from a number of angles and were able to test the results with a high degree of accuracy. Once we had some confidence in the algorithms, we started retroactively processing ratings to try to correct for the fraud.
This correction process revealed that the fraud had successfully stolen roughly $8,000 in prize money over a few months. We'll likely never recover the money from the thieves, but it's certainly not the developers' fault that it happened and they shouldn't be punished for it. Even though we were staying quiet about it for a while, we went back and paid the developers who should have won those prizes from May through September.
While I’m thrilled that Kongregate management decided to put our developers first, the point of this article isn’t to pat ourselves on the back, but rather to discuss our methods and some of the measures we have been and will continue taking. Hopefully sharing this will help other platforms and developers continue to refine the process of keeping ratings accurate.
A wise man once advised that you should "know your enemy": Zack de la Rocha of Rage Against the Machine. So let’s start there. The source of these attacks appears to primarily be Russia and Ukraine. The average monthly salary in those countries is somewhere in the range of $200 - $600. So a $1500 prize can be months of salary and is clearly a huge motivation, and even the smaller prizes of $250 are substantial.
A variety of methods have been used to improve effectiveness and attempt to hide their activity. I’d be lying if I didn’t admit some are pretty clever; I just wish they’d use these skills for something more productive. Here are some (but not quite all) of them:
- Our contests end at midnight on the 3rd day after the last submission date (the 3 days gives time for late entries to get votes). So we would see a spike of ratings just before midnight to sneak in before anyone could try to fight or report it.
- Assuming we were looking for spikes of 1 and 5 ratings, they used 2s, 4s, and even 3s in bulk to slowly but effectively shift the rating.
- The midnight blitz got a bit more sophisticated later on. We would see that rapid change in game ratings, but then they would flood the game with opposite ratings just after midnight. Why? Because it covers their tracks! The rating after midnight will match the one before midnight, but our contest system would capture the rating at exactly midnight.
- In another attempt to cover tracks, they initiated a bunch of fake ratings just before midnight, then re-rated the game with those same accounts, hoping we only tracked the final rating and not each individual rating event.
- More recently, they’ve been engineering our own players against us by messing with the tags on games (which are also crowd-sourced by voting). So we’d see an action game tagged with “Idle” and “RPG.” Players get mad and give a low rating because it’s not what they were expecting or because they thought the developers had been dishonest.
Note: this is not a critique of those players. We’ve all been there ourselves. You take a drink of water and discover it’s actually beer. Doesn’t matter if the beer is Pliny The Younger, it’s gonna taste gross. I could write an entire separate blog post on managing expectations.
What Won’t Work (Or Isn’t an Option):
We spent quite a while talking internally and with some trusted developers about this and came up with a bunch of ideas. Unfortunately, most of them have serious problems.
IP bans. As you might guess, IP addresses are useless here. The fraudsters bounce all around, making any attempts to track or block by IP impossible (and adding the risk of accidentally blocking innocent IP groups).
Requiring accounts to be X days old to vote. It turns out the age of these accounts also doesn’t help. Some of these accounts are a few years old (?!). We are confident we know when the current, relatively large-scale fraud started happening, but it’s quite possible that the fraudsters started small and only expanded once they were happy with their results. Perhaps they just grabbed a ton of old accounts with insecure passwords, but either way it’s a dead end.
Weighting votes by account level. Newgrounds does this and there’s probably some value here, but I think it’s mostly to try to be more accurate, not trying to stop fraud. This is unlikely to help our particular case for two reasons. If we set our curve steep to heavily favor high-level accounts, the fraudsters would be able to manually level up a few accounts to have substantial influence. If we go with a shallow curve, the sheer volume of compromised accounts would still be able to accomplish the manipulation.
Increasing friction for new accounts. A central part of our business is helping guests get registered as easily as possible. By making this more difficult we would be putting our business at risk, potentially much more than the damage done by these ratings. We may later consider adding something like CAPTCHA, but for now we want to avoid it if at all possible.
As anyone who has tried to fight hacking knows, you end up in an escalating, expensive, and often losing battle. So instead of fighting fire with fire, we’re going to remove the firewood. In particular, we’re going to make the ROI (Return On Investment) calculation much less favorable. That calculation involves dividing the reward by the effort, so ideally we want to shrink the reward and increase the effort.
I’m not going to list all of our methods here, and will avoid some details, so that we don’t reveal our full hand. But this sample of our strategy should help illustrate the philosophy and implementation of our approach. Our main targets are motivation, PITA, and obfuscation.
Motivation: Money is of course primary here, but there’s also the draw of the exposure and prestige of winning a contest and being highly rated on Kongregate. To start with, we disabled automated prize determination and payments. Instead we started doing it manually and with a fine-toothed comb. This was time-consuming, but we didn’t send payments until we were reasonably confident in the results. If developers are paying for these fraudulent rating services, our actions would hopefully harm the reputation of the service provider.
We debated calling out the games and developers that benefitted from the fraud. The public shame would be great, but there’s always a risk that some of them were only tangentially related and didn’t know it was happening. We likely won’t do this, but it’s a tool you may be able to use in your own situation.
We’re also going to be moving from Weekly + Monthly contests to only Monthly (and increasing the Monthly prizes to compensate). This will allow us to keep an eye on things more easily and simplify the verification of prizes.
PITA: No, not bread for scooping delicious hummus, but Pain In The Ass. In other words, rather than putting up a taller wall, let’s just cover the current one with skunk essence to make it really unpleasant. By adding some annoyance, that ROI denominator gets just a little bit bigger.
For one, we identified a bunch of accounts that were clearly fraud-related, and permanently banned them. Can the fraudster make more? Sure, but it’s annoying to replace those 4000 or so accounts.
We're also evaluating adding CAPTCHA back to our registration form. We had it a few years ago, but CAPTCHA wasn't as sophisticated so it wasn't hard to bot and it was annoying to legitimate registrations. With reCAPTCHA now around we may add it back, again making the process of replacing those banned accounts even more cumbersome.
Obfuscation: A major part of doing this sort of engineered attack is information collection. For one, we realized that our ratings update in real time and are visible to 5 digits after the decimal. This means that you can very easily see exactly what the impact of your rating is. This allowed them to get around some of our protections that were already in place and also to carefully, almost surgically, bump their games into higher places.
Some obvious first steps are to remove that 5 digit precision from being publicly visible, and also to potentially cluster updates to recalculate periodically. But how else can we mess around with the information available?
As we’re going back and correcting ratings, we’re actually intentionally not correcting some games at random (that we know have fraud but wouldn’t be significantly impacted either way), meaning we’re not revealing how much we are and are not aware of. Similarly, when we banned those accounts, we randomly didn’t ban some of the accounts, again making it harder to figure out what criteria we used to identify them.
Taking this super-meta, I’ve tweaked some details in this article at one point, just to keep them guessing if the fraudsters read it. What can I say, I’m committed to the cause.
So that was a long, winding tale, but hopefully one that was useful, or at least interesting. We’ll see how effective all this is over the next few months and perhaps will be able to do a follow-up article, good or bad. If you have some thoughts about all this I’d be happy to talk with you -- you can email me directly at firstname.lastname@example.org.