english česky

Methodological notes

Coronavirus SARS-CoV-2 "The little Coronavirus"

The base of the methodology is the direct comparison between curves of the daily cases of infection. At the first sight, the spread of the infection in different countries looks like incomparable. The rules of the exponential growth of infection causes, that looking at the daily cases curves, no connection between the daily cases could be observed. The main problem is that the exponential growth very strongly depends on the "starting position", i.e. how many people were infected at the start of the tracked period. It may be said, that if there were 1 infected person in one country while there were 30 infected people in a second country at the same time, the curves of daily cases of the second country will be 30 times bigger than in the first country every day, even though both countries took completely identical measures to stop the epidemy spread. Therefore, if we want to compare, how effectively countreis manage the epidemy, we need to clean up the data from the effect of the starting position.

For this purpose, these comparison methods are used:

  1. Normalization of the daily cases
  2. The graph of the epidemy spread speed
  3. The daily ratio graph of two curves
  4. The graph of the sum of daily cases

Normalization of the daily cases

The timeline of daily cases are scaled in a way so that they match in the absolute value. At the moment, the method of 95% percentile is implemented. I.e. the 95% percentiles (i.e. value from the place of 95% of values when sorted by the size) of all graphs are matching. It could be also possible to use the maximum value of daily cases instead of the 95% percentile, but this would make troubles with some countries like France where they confirmed lots of past cases some days, much more, than the real cases really occured this day. The 95% percentile ignores this local peaks and is therefore more robust when considering errors in cases counting. Even this scaling may make troubles sometimes, but it works well in most cases.

The following graph shows an example of scaling the daily cases of the Czech Republic and the United Kingdom:

The graph of the epidemy spread speed

For the purposes of comparing the spread, it is useful to define a parameter which we will call the spread speed of the epidemy. We will define it as a value:

            spreadSpeed = ln(dailyCasesToday/dailyCasesYesterday)

This number is a key value to say how much the epidemy is spreading the country. The ratio of cases today vs. cases yesterday somehow relates to the reproduction number of the virus. In fact the ratio corresponds to the reproduction number of a theoretical epidemy, where all cases are infected exactly next day when the primary contact was infected. In reality, of course, most infections happens later than the first day causing the ratio and the reproduction number does not match. But they still relates and the connection between both parameters is never arbitrary. This always holds: If the ratio is (long term) greater than 1, even the reproduction number is greater than 1. If the ratio is (long term) equal 1, the reproduction number is also equal 1. And if the ratio is smaller than 1 (long term), even the reproduction number is smaller than one. I expect (but without a mathematical proof) that at least approximately the reproduction number will be some (fractional) power of the ratio where the value of the exponent depends on the characteristics of the epidemy. But I don't insist on the fact, that one value is the power of the second one. Maybe the relation between both parameters is more complicated.

The ratio of today daily cases and yesterday daily cases is moreover logarithmed. The logarithm does the following:

Because daily cases tends to significant statistical deviations, it makes sense to average the daily cases for a couple of days. In the graphs, typically the cummulative moving average for 3 days is used.

The spread speed is a parameter, which does not depend on the "starting position". The spread speed is influenced only by parameters related how fast the epidemy is spreading the country. Moreover it holds, that if we know the "starting position" and the spread speed, we are able to reconstruct the progress of the whole epidemy.

The following graph shows different variants of the spread speed for the Czech Republic. The variants differ in the length of the window for the cummulative moving average. In the graph, 3 curves are drawn: no average (blue points), 3 day average (orange line), 10 day average (green line).

The daily ratio graph of two curves

As the term daily ratio graph of two curves, we will understand a graph where at each day we will put the value corresponding to the ratio of both curves. The sense of such a graph is to monitor, if no systematic deviation happens in case the spread speed curves looks very similar. In this graph there is always one country the reference country which is not visible in the graph and all other countries are compared to this reference country. In general, it could be said:

Relatively getting worse or better needs to be taken carefully. There may be more reasons for getting relative worse or better. For example, just a relative time shift between the epidemy waves would cause such observations. Or even the methodology of data collection may cause such improvements or deteriorations. It may be said, that the predictive value is only in the information if the curve of values is constant. If the curve is growing or falling, a deeper analysis is necessary, to find the root causes of such a situation.

The ratio graph may be calculated not only from the daily cases curves but also from the total cases curves. It is an interesting observation, that for the same country, both ratio graphs tends to show the same trends. Most probably, this is related to the exponential character of the epidemy spread. Anyway, the daily caess ratio graph has a bigger predictive value.

Because in this graph the predictive value depends only on how the curve grows or falls but the absolute values are unimportand, we may rescale the curve so that its average is exactly 1. Even a logarithm may be applied, which removes the non-linearities. The growth then may look more intuitive.

Example: Ratio graphs of daily cases and total cases for the Czech Republic as the reference country and Germany as the compared country follows. Both graphs are normalized (scaled to mean value of 1) and logarithm is applied.

The graph of the sum of daily cases

If we want to ephasize some local anomaly which is occuring in multiple countries, it may make sense to sum up the daily cases of all countries together and then work with the cluster of all countries like they would be just a single big country. From the summed daily cases curve, also the epidemy spread speed may be calculated for the whole country cluster.

Example: Effect of friday the 13th in the countries of western Europe (here the sum graph of countries: Belgium, Finland, Island, Italy, Germany, Netherlands, Norway, Portugal, Austria, Spain, Sweden, Switzerland):

Cached sources: Dynamic sources: testing page