# Plotting the Curve of COVID-19

“If you look at the rates of new cases that are being diagnosed, we’re on an exponential curve,” NIH Director Francis Collins told Peter Wehner for an article in *The Atlantic.*

What does this mean, “we’re on an exponential curve”, and why does it matter?

First, what it means is that the best function to describe the rate at which the number of cases is changing day-by-day is an exponential function with the form:

*y = a∙b^x*

Where:

*y*is the number of cases*a*and*b*are both just some constant, like 7 or 0.2983, and these are found by some fancy mathematical process so that the resulting plotted curve is as close to the actual data points as possible.*x*is the number of days from when you started counting and the caret character "^" indicates superscript because this platform does not support actual superscript. For example, you would read "b^2" as "b squared".

For example, suppose the number of reported cases from March 5 to March 11 in the US look like the first three columns in this table:

We can do the math, or get some software such as Excel to do it for us, to calculate:

* a* = 173.39 and *b* = 1.334172

This is called “fitting” a curve to the data, and if we plug these two constants into the formula above, we get the fourth column of the table. For example:

(173.39) x (1.224172)^1 = 231 and (173.39) x (1.224172)^2 = 309, etc.

When we plot these, they look like this:

You may notice that the blue dots representing the real reported number of cases do not fall exactly on the dotted line drawn from our calculated exponential curve. That’s because real data is almost always noisy, which means it doesn’t follow any simple mathematical function exactly. This could be due to all kinds of factors, such as delays in testing or imperfect results, or any other number of factors.

What we can do, however, is calculate a number which indicates how closely our calculated line matches the original data points. This is called the “coefficient of correlation” and is often written as *R^2* or* r^2*. When *R^2* = 1.0, the match is perfect, and when *R^2* = 0, there is no match at all. The Wikipedia article on the topic provides two different methods to calculate *R^2,* and Excel appears to use two other methods to calculate *R^2*, so there is no real consensus. The trick simply is to use the same method when comparing different curves fit to the same data.

The *R^2* for the fit of an exponential function against the number of COVID-19 cases in the US between March 5 and March 11 is 0.994 and 0.998, depending on which Wikipedia method you use. Excel reports* R^2 *= 0.996, by whatever way it calculates it.

In any case, an exponential fit to the data has implications, and an important implication is the rate at which the values increases. In the case of an exponential function, the ratio between any two adjacent values is constant, and the ratio between any pair of numbers with the same spacing is constant. In the case of our example. The ratio between any adjacent pair is 1.33, between every second pair is 1.78, and between every third pair is 2.37. Thus, the number of cases increases by 1.78 times every second day and 2.37 times, more than double, every third day. That’s very scary for a disease, especially one as deadly as COVID-19.

Not all data is well approximated by an exponential function, however. For example, the number of cases by day in Wisconsin from March 24 to March 29:

If we fit an exponential curve to these data points, the numbers don’t look unreasonable, but the* R^2* = 0.987, and the graph looks like this:

You can clearly see that the line is above the first data point, below the middle data points, and is back above the last data point. It is curving upwards more than the actual underlying data.

In this case, we can look to fit other curves. For example, we can try a “linear fit”, that is to find a straight line of the form:

*y = m∙x + b*

Where:

*y*is the number of cases (as before)*m*and*b*are both just some constant, like 7 or 0.2983, and these are found by some fancy mathematical process so that the resulting plotted curve is as close to the actual data points as possible (as before)*x*is the number of days into the outbreak (as before)

We can do the math, or get some software such as Excel to do it for us, to calculate:

* m* = 131.97 and *b *= 320.27

If we plug these two constants into the formula above, we get the fourth column of the table. For example:

(131.97) x (1) + (320.27) = 452 and (131.97) x (2) + (320.27) = 584, etc.

With a linear fit

*R^2* = 0.999 and the graph looks like this:

This looks much better, and the implication of a linear fit is that the number of cases increases by a fixed amount each day, 132 in this instance. That’s still scary for a disease, especially one as deadly as COVID-19, but not nearly as bad as exponential growth.

One more aspect of exponential functions and graphing is that they appear as straight lines when graphed on a logarithmic scale, which means the values on the graph vertical axis increase exponentially. Instead of 100, 200, 300, etc. they are 100, 1000, 10000, etc.

Below are the same data points and exponential curve fit to them from the first graph but now plotted against a logarithmic vertical axis

So, how are we looking over the long haul?

The US looks bad, but we are finally increasing at a slower rate than the best fit exponential curve. Instead, a 5th order polynomial has a fit of *R^2 *= 0.9999

It is even more clear when plotting cases per 1,000,000 population

Here also is the case data plotted against a logarithmic scale. You can see the exponential fit appears as a straight line.

The state of Wisconsin, on the other hand, is far below the best fit exponential curve, and a straight line is a much better fit for the last 7 days.

Finally, Milwaukee County is also doing much better than an exponential curve and is even increasing more slowly than the recent linear growth.

Perhaps all the physical distancing we are practicing is finally doing us all some good.