Neuberg

JeremyChild · May 18

I've been thinking about Neuberg (as one does).

It is clear that an obviously fair solution for the situation (boards played a different number of times) does not exist, but we must use something.

In the literature, there is an assumption that "all boards should count equally". This sounds a good principle, but the actual effect is highly dependent on how you do the counting and what equally means. Leaving that aside, I looked at why we use Neuberg.

The Neuberg paper by Max Bavin (https://www.ebu.co.uk/documents/laws-and-ethics/articles/neuberg-formula.pdf) takes as read that a top on a board with 11 results is better than a 65% result on a board with 101 results.

That is not at all self-evident to me.

What I am interested in is doing a simulation to establish how likely each case is. To do that I would need historical information as to how likely a given pair is to beat another given pair on any particular board.

Does anyone know if such a dataset exists? (clearly it will exist in the various P2P uploads to the EBU, but whether it is available and extractable is a different proposition).

Or maybe someone has already done such a simulation?

VixTD · May 27

When you assign artificial scores to pairs playing a board, no account is taken of the players' ability or form. If a pair is unable to play a board, they will get the same artificial score whether they are the club experts who are expected to score ~70% if they played the board, or the novices whose expectation might be around 30%.

Whether this is desirable or not is another matter. I suppose it would be possible to estimate the likely outcome of a board based on NGS or some other measure of ability.

The same thing happens when the Neuberg adjustment is made to allow for missing comparisons. When I explain how it works on the Club TD course I usually start with what happens when a pair have to take an average. If there are ten tables the maximum score on a board is 18 MPs. I say that if a pair has to be given an average (an average of the matchpoints available, so 9 MPs), the others are scored without the averaging pair, and then an adjustment is made to bring the winning pair's top (now 16) closer to a normal top (18), and the losing pair's score close to a normal bottom. The top score shouldn't quite reach a normal top, because the winning pair beat only eight other pairs to get that score, rather than nine, and the losing pair lost to eight other pairs rather than nine. (There remains some doubt as to the outcome had the board been played one more time, but the expectation is that if a pair beat eight opponents they would likely, but not certainly, have beaten the ninth as well.)

This always seems fair to the participants on the course, particularly to the retired maths teachers (there's usually one or two) who can understand how it all works. The winning pair on the fouled board now scores 17.89 MPs, and losing pair 0.11 MPs.

I think this is just a way of calculating the expected result had the board been played that other time, on the proviso that the players' ability and the likelihood of anyone actually beating the score obtained are not taken into account. (It makes no difference whether your top came from eking out an extra overtrick in a partscore or defeating the opponents seven tricks in six diamonds doubled because they had a rare, unusual, expensive bidding misunderstanding.)

It's quite common in statistics to have to estimate the "expected" value of a variable X given certain constraints. This seems to me to be a similar exercise.

Max postulates an extreme example in his paper to make a point. If you imagine you run a simultaneous pairs tournament at several venues, one of which has seven tables and all the others five, you could see how board 26 could be played at one venue only. (The larger venue plays 13 x two-board rounds, boards 1-26; all the others play 5 x five-board rounds, boards 1-25). If there are ten venues, there will be 52 tables, so a top on boards 1-25 will be 102 MPs, and a top on board 26 before adjustments would be 12.

When the Neuberg adjustment is made to board 26 the top score will be 95.57 MPs, about 6.3% short of a regular top. This reflects the uncertainty in the outcome had the board actually been played at the other 45 tables.

I can't intuitively see that this shortfall is "correct", but it feels about as fair as anything. There are always naysayers who deride Neuberg as being wrong (respected Australian TD Ian McKinnon among them), but I don't feel strongly enough about it to read through the arguments. I know it's not what you were asking, but if you are so inclined you could start here (see attached).

ais523 · May 28

I understand the reason why Neuberg gives slightly less than a top to a player who has scored a subfield top to be the possibility that the results that "should have been obtained" from the results that were not comparable would be even larger. (It can be compared to a weighted result after an irregularity – in this case, the possibilities being weighted are the possible results that could have been obtained on tables that should have been compared to the existing table but can't be.)

A good example is to look at the results in a simultaneous pairs – an absolute top in a sufficiently large tournament, on any board, is usually going to be an absurd result like +2300, regardless of what the normal results are, because in a sufficiently large field it is very likely that at least one pair will find some way to screw up spectacularly. If you have the results from only one subfield – say, a single club out of those playing in a simultaneous pairs – then an absolute top of the subfield is unlikely to be an absolute top from the tournament as a whole. As such, if you are trying to estimate the overall results from subfield results, which is basically what Neuberg and similar algorithms are doing, you should estimate the scores (especially the extreme scores) as being slightly closer to average in the large field as they were in the subfield.

Trying to do this accurately would probably depend on the actual scores and knowledge of how surprising they are, which it seems difficult for a formula to make use of. (For example, if I see that in a subfield, most of the results are +170 and one of them is +420 – an absolute top in the subfield – I would suspect that the +420 would not be an isolated result if the board were given to a larger field to get more results. If the odd result out is +1400, there is much more of a possibility that it won't be duplicated (or at least, you will likely need a much larger field to duplicate it). So in theory you would want the +1400 to be given more matchpoints than the +420 when trying to estimate large-field results based on those in a small field, but none of the existing formulas take that tendency into account, nor can I see a reasonable way to make them do so.)

JeremyChild · May 28

Thanks, VixTD

Neuberg

Comments