I am a huge fan of HEMA Ratings, I think it’s an extremely valuable tool for the HEMA community, so much so that I have supported their Patreon since it first became available. However, it is important to remember that HEMA Ratings is a specific tool, and not all tools are useful in all scenarios. There are some use cases at which HEMA Ratings absolutely excels, namely creating initial tournament seedings, and creating skill tiers in tournaments.
There are also some uses that it is not good for, but the layout and idea of the ratings compels our brains to be attracted. A couple of examples are fencing knowledge, value and self-worth as a person, and the one I’m going to be discussing in this article, world ranking.
This is the second part of my series related to HEMA Ratings:
Part 2: World Ranking
Part 3: Recovery
What is HEMA Ratings?
HEMA Ratings is a database of fencers who have competed in tournaments, each assigned a weighted rating calculated using an algorithm called Glicko-2. The rating is calculated using the wins and losses of each fencer, with wins increasing your rating, and losses decreasing it. Relative ratings are taken into account, the higher your opponent’s rating, the more you gain from a win, and the less you lose from a loss. Glicko-2 is similar to Elo ratings used in chess. It is a misconception that “Elo” is a general term for all relative rating systems; in reality Elo is a specific algorithm named after a physics professor named Arpad Elo. As such, “Elo” is not an acronym and should not be all caps.
The main difference between Elo and Glicko-2 is the presence of the “rating deviation” or “confidence” value. I go into more depth on rating deviation in part 1 of this series, which you can find here, but basically what this does is 1) gives you a higher weighted rating the more confident the algorithm is that your rating is accurate (weighted rating = base rating – (2 x rating deviation)), and 2) affects how much one rating will change another after a match (a rating with high confidence will affect others more and move less, a rating with low confidence will affect others less and move more). The presence of rating deviation makes Glicko-2 better for small sample sizes than Elo, though as for any algorithm, the more data the better.
On a weapon page of the HEMA Ratings website, weighted rating is visible, and rating deviation can be seen by mousing over the colored thermometer on the right. Base rating is a hidden value, but can be calculated by adding twice the rating deviation to the weighted rating. On the left, you see the “ranking” value, which we’ll get to.
So all that being said, what is the weighted rating a measure of? Basically it’s an analysis of your past match success that can be used to predict outcomes of future matches. The wider the rating gap, the more likely the higher rated fencer will win. As of a 2023 update, that information for past matches is shown on a fencer’s ratings page:
The ratings tend to be accurate at this, see Sean’s SwordSTEM article on HEMA Ratings for more information. Because of this, as I have mentioned earlier, HEMA Ratings is a fantastic tool for seeding pools and creating tiers in tournaments. Every tournament organizer should be using it for this by now.
Why HEMA Ratings is bad for ranking
The first thing you see when you click on a weapon category on HEMA Ratings is the top rated players, ranked in order. The “Rank” value is given a prominent spot on the left side of the screen, the first piece of data you look at. It does make sense to put the data in order like this, but the arrangement also may give the impression that the top ranking is the most important aspect of the ratings, when it isn’t, the weighted rating is really what matters.
The simple answer to why HEMA Ratings does not do a good job of making an accurate top 10 list is because top players never play each other. That’s all you really need to know, so if you don’t care about the details you can just read that one sentence and be done with this article. To demonstrate this, I looked at the top 10 rated fencers from December 2023 and December 2019, counted how many matches they did that year, and how many times they matched with another top 10 player.
|Total Matches in year
|Matches with other top 10
|Total Matches in year
|Matches with other top 10
In 2023, no top ten rated fencer fenced any other top ten rated fencer. In 2019, it happened four times, twice at the European Games Invitational Gala Tournament in Minsk, once at Helsinki Longsword Open, and once at a Dutch federation event. The European Games Tournament (henceforth referred to as “Minsk”) was the only time there was a concerted effort made to get as many top players together as possible that had any amount of buy-in from the community at large. At Minsk, six of the top 10 rated fencers in December 2019 attended, and of them they matched with each other twice (Olbrychski/Fabian and Olbrychski/Kultaev).
Longsword fencing is a regional sport, and it may be becoming more regional (See SwordSTEM: Trends in the Quality of Competition in HEMA). This is not necessarily a bad thing, it means that there are more tournaments, so people have the opportunity to compete closer to where they live. More tournaments is a good thing, and more people will be competing who otherwise would not have because they don’t have the means or desire to travel for their hobby. Ultimately this is a hobby for almost everyone who does it, there is very little incentive to travel if you don’t have to. Because of this, top rated fencers tend to get there because they win a lot of matches in their region, even though they may not have matched against anyone else at the top. If someone has very few or no losses, the rating algorithm does not know how to place them, because there is no known ceiling. This is helped somewhat by the rating deviation value, but that’s still not enough. I’m not saying that anyone is being insidious or purposefully gaming the ratings, it’s just how they play, traveling is difficult and this isn’t a professional sport. As a result, the top spots are populated by fencers whom we know perform well in their own region, but are not sure how they would compare to each other.
I’m sure you can easily deduce why it’s important for rankings for top players to play against each other. To further push the point, I am going to reference a case in competitive Super Smash Bros. Melee (if you have read my other articles this should not come as a surprise). The information here is referenced from this video, which is a fun watch if you have time. The short version is, there is a Melee player named Tyler Hankins (known by his gamer tag Hanky Panky), who only attends tournaments that he knows he can win in order to make money. He is a very good player (you have to be in order to be able to farm any tournament), but he avoids matches against top players because he knows there won’t be a payout.
Melee has an official annual ranking, which is done by hand, taking into account things like tournament placement and head to head records. In 2017, Hanky Panky did not qualify for the official rankings because he did not meet certain requirements for competing out of state. However, someone made a Glicko-2 rating list for 2017, and Hanky Panky ended up ranked 5th.
It’s interesting to note that Glicko-2 did not do that bad of a job here, there are a lot of names in common between those two lists. But there are other factors at play here, the top players still win a lot, but they also play against each other, because there is incentive to do so. So it makes sense that the top players would still be at the top of a Glicko-2 ranking, it’s just Hanky Panky who is the anomaly. In the case of longsword, top players never play each other, so everyone is Hanky Panky.
At this current point in time, I don’t think there’s any real way to create an accurate world ranking of top longsword players. In order to do that, you need to give top players an incentive to play against each other, and to pull that off you need some kind of infrastructure.
Before I get into other systems, I can address if there’s anything I think can be done to HEMA Ratings itself to make it better for this purpose. One simple improvement would be to hide all inactive players. This would provide marginal benefit, though if you wanted to maintain a high rating you could simply go to a small tournament and renew your rating every two years. There may be some things that you can do to the algorithm to make it better for world rankings, but I don’t think it’s a good idea to mess with the algorithm so we don’t compromise on what the ratings are actually good for, IE seeding tournaments. I think the best thing we can do with HEMA Ratings is work to change the public perception of it, it shouldn’t be seen as a top world ranking system, and getting to the top of HEMA Ratings shouldn’t be a fencer’s goal.
Since changing HEMA Ratings is out, I will speculate on other possible methods of world ranking. To start, I will use Melee again as an example. Their accepted rating system works by having panels of players hand pick who goes in what order, based on tournament results and head to head records from that year. This is a method that I do not think is feasible in longsword, you would need buy-in from too many people, you would have to choose a panel, and still the problem would remain of top players not playing against each other. However, there may be one thing we can apply directly from the Hanky Panky situation – he was excluded from the official world ranking because he did not meet requirements for competing outside of his region. Perhaps we can apply something similar, like hiding inactive players, HEMA Ratings could specify whether or not you have competed outside of your region.
Ultimately you can’t get around the problem of top players playing against each other. Without that, any top ranking list is going to be arbitrary. Something that could create an incentive for this is something like what the International Fencing Federation (FIE) or International Judo Federation (IJF) does (or indeed most sports with Olympic qualification requirements), where you have different levels of tournaments that award different amounts of points towards your world ranking. A system like this requires you to keep competing in order to maintain your ranking.
The problem with this is that it requires infrastructure, resources, and buy-in. You need someone to set up this system, which would be an international effort, and you need people to agree that this is the one legitimate world ranking. Of course anyone can say that their event is a world championship or a national championship or whatever, but if they don’t have buy-in from a plurality of the competitive community, it won’t mean anything. The European Games in Minsk was one example of a tournament like this that did successfully get people to play along and regions to send their best players, which resulted in the highest rated tournament ever run, so this idea is not impossible. Though I was not at Minsk, I have heard second hand that the event left a sour taste in a lot of peoples’ mouths about this type of effort, so maybe not in the near future.
There’s also the issue of logistics, top players actually need to be able to get to these events. Longsword is a side hobby with no sponsorships or financial support for athletes, so anything like this would only be available to those with the means to travel internationally multiple times per year. This would make it difficult for many people, and would bias the results towards people who are capable of competing. In light of all of these issues, I don’t think it’s possible to have a legitimate, meaningful world ranking for longsword at this current point in time. But I believe that as the sport grows, ideas like these will become more feasible, and be worth looking into.
In the time after I had written this article and before I have published it, I had the privilege of competing in Helsinki Longsword Open 2024. The tournament rating for this one was 1993 (using Sean Franklin’s PB25 algorithm), which makes it the third highest rated tournament of all time, behind only Swordfish 2018 and Minsk. With any luck, this is a sign of things to come, and a signal of a resurgence of tournaments in which high level competitors fence with each other. Even so, it’s not enough – only two of the HEMA ratings top 10 were in attendance (Swordfish 2018 had 3, Minsk had 6), and we did not end up matching with each other. In order to get a good picture, you would probably need something like 4-5 Minsk level tournaments per year. I think we are a long way off from that. I will say this – after seeing the top rated fencer in the world, Antoni Olbrychski, in person, I personally believe that he is the best.