# Risk-Limiting Audits

Vote-counting is subject to errors in interpreting ballots, tabulation, and counting. To guard against such errors, we audit election results to ensure that we elect the right candidates—that we counted all (and only) the ballots that were properly cast, that we interpreted each ballot correctly, and that we counted them correctly.

For an audit to be meaningful, it must have a means of determining that the count was correct and of correcting the count if it turns out to be wrong. Somewhat surprisingly, virtually no elections in the United States are effectively audited in this sense.

Recently (since 2007), a mathematically rigorous audit mechanism has been gaining traction. Largely the work of Philip B Stark (UC Berkeley professor of statistics) and other academics, the mechanism goes by the name “risk-limiting audit”. The “risk” that’s being “limited” is the risk of ending up with the wrong outcome: a risk-limiting audit guarantees us that the statistical probability that we got the count wrong is less than some predetermined chance.

## Overview

The principle of a risk-limiting audit is simple: we randomly check the integrity of a sample of ballots as counted against their authoritative paper version and apply statistical measures that tell us, based on the details of a particular count (number of ballots, margin of election, errors found), just how confident we are that the result of the count is the same as that of a manual count of all the paper ballots. If the confidence level falls short of some predetermined threshold, we increase the sample size and continue until either the confidence threshold is met, or we’ve checked all the paper ballots, effectively performing a manual count.

The mathematics can be subtle, but the principle is straightforward, as are the actual calculations. By performing the calculations openly, they can be subjected to the scrutiny of independent (and skeptical) statisticians, as well as the candidates and electorate.

## Is It a Risk-Limiting Audit

We already have an audit procedure. Is it risk-limiting?

Probably not, but here’s an easy way to tell. If an audit is risk-limiting, there are exactly two possible outcomes of the audit. Either we will have a (statistically justified) predetermined confidence level that the nominal result is correct, or we will have conducted a full manual recount. If an audit procedure does not *always* lead to one of these outcomes, it’s not risk-limiting.

## Procedure

The steps in a risk-limiting audit:

- Determine required confidence threshold.
- Choose initial sample size.
- Select sample ballots.
- Compare sampled ballots to nominal results.
- Calculate confidence level in result.
- If confidence threshold is not met, increase sample size and continue at step 4.
- If confidence threshold is met or all ballots have been checked, audit is complete.

The initial sample size (step 2) is a function of the size of the election, the margin of election, the confidence threshold and the number of ballots per sample batch. We want a small sample size, for efficiency, but big enough to have a reasonable chance of reaching the confidence threshold.

The increased sample size (step 6) additionally takes the error rate from previous samples into account. The procedure will terminate when the confidence threshold is met, or all ballots have been manually counted.

## The Basic Idea

The idea underlying risk-limiting audits is simple enough. Consider an election with two candidates John and Paul and 1000 voters, where the apparent result is that John wins by a vote of 525-475, a margin of 50 votes.

Suppose that the count is wrong—that enough ballots were miscounted that Paul is the actual winner. For this to be the case, at least 50 ballots must be wrongly recorded. Suppose that we randomly choose one of the 1000 ballots and check it: if there are 50 mistaken ballots in the pile, we have a 50-in-1000, or 1-in-200, chance of picking a wrongly counted ballot. If we keep randomly picking ballots, the chance that we *won’t* find a bad one is 950/1000 times 949/999 times 948/998 …; if we randomly pick 100 ballots without finding any errors, the odds against there being 50 bad ballots are a little worse than 220:1; if we’re satisfied with (say) 100:1 odds against a wrong result, we should be satisfied that the original result is correct: John really has won the election.

Of course, finding just one bad ballot in our sample isn’t very good evidence that there are another 49 or more like it; the actual calculation is more complex. But the upshot is that we end up with the probability of a wrong result. If that probability is very low, the audit is complete. Otherwise we keep sampling, until either the probability of a wrong result is low enough (below our confidence threshold) or until we’ve performed a full recount: the sample is 100% of the ballots.

## Sample Size

If we sample individual ballots without finding errors, our confidence level eventually rises quite quickly with additional ballots; in our example above, as our check count goes from 80 to 100 ballots, our odds against a bad result rise from 72:1 to 223:1. Sampling individual ballots isn’t always practical, with existing election-count practices. We may need to check bigger samples, perhaps a precinct at a time. In that case, the required sample size for a given confidence level increases substantially, but the principle remains the same.

misinterpreted ballots by precinct |
random precinct of 500 |
10 random batches of 50 |
simple random sample of 500 |
---|---|---|---|

10 in every precinct | 100% | 100% | 99.996% |

10 in 98 precincts, 20 in 1 precinct | 99% | ≈100% | 99.996% |

20 in 50 precincts | 50% | 99.9% | 99.996% |

250 in 4 precincts | 4% | 33.6% | 99.996% |

500 in 2 precincts | 2% | 18.4% | 99.996% |

## STV and Risk-Limiting Audits

Auditing STV elections presents challenges not found in auditing conventional plurality-winner elections. First, it’s not practical (or meaningful) to count a subset of an STV election (recounting a sample precinct, for example). Second, STV requires a broader definition of “margin of election”, because the STV counting process makes a series of intermediate decisions with their own margins (such as defeating low-vote candidates) that potentially affect the outcome of the count.

This means that when we sample an STV election, we want to sample and check individual ballots, checking them against a ballot file that is used as the input to the actual count.

Sampling individual ballots (as opposed to larger batches) is desirable for another reason when auditing an STV (including IRV) election. Not only is the sample size smaller, but it’s practical to check the entire ballot (all rankings) at once. The manual ballot check procedure is separated from the count itself.

Happily, as we saw in the table above, single-ballot sampling is also desirable when performing risk-limiting audits of simple majority elections, because the number of ballots that must be manually checked to arrive at a given confidence level is much smaller than the number required when sampling by batch.

STV (including IRV) introduces another factor into audits: the margin of election in an STV election is not simply the winning margin at the last stage of the count, but includes all the intermediate decisions that lead to the final result, such as the decision to defeat low-vote-count candidates in cases where defeating a different candidate could lead to a different election outcome. It’s worth noting that, for auditing purposes, nothing is lost by using a lower bound of the margin of election instead of an exact value, except that the sample size may be somewhat larger.

[…] been less work about the latter, and that’s something I’ve started working on, because auditing the outcome of elections is an important step in ensuring voter confidence in the results. LikeBe the […]