The base of the methodology is the direct comparison between curves of the daily cases of infection. At the first sight, the spread of the infection in
different countries looks like incomparable. The rules of the exponential growth of infection causes, that looking at the daily cases curves, no
connection between the daily cases could be observed. The main problem is that the exponential growth very strongly depends on the "starting position", i.e.
how many people were infected at the start of the tracked period. It may be said, that if there were 1 infected person in one country while there were
30 infected people in a second country at the same time, the curves of daily cases of the second country will be 30 times bigger than in the first country
every day, even though both countries took completely identical measures to stop the epidemy spread. Therefore, if we want to compare, how effectively
countreis manage the epidemy, we need to clean up the data from the effect of the starting position.
For this purpose, these comparison methods are used:
- Normalization of the daily cases
- The graph of the epidemy spread speed
- The daily ratio graph of two curves
- The graph of the sum of daily cases
Normalization of the daily cases
The timeline of daily cases are scaled in a way so that they match in the absolute value. At the moment, the method of 95% percentile is implemented. I.e.
the 95% percentiles (i.e. value from the place of 95% of values when sorted by the size) of all graphs are matching. It could be also possible to use
the maximum value of daily cases instead of the 95% percentile, but this would make troubles with some countries like France where they confirmed lots of past
cases some days, much more, than the real cases really occured this day. The 95% percentile ignores this local peaks and is therefore more robust when considering
errors in cases counting. Even this scaling may make troubles sometimes, but it works well in most cases.
The following graph shows an example of scaling the daily cases of the Czech Republic and the United Kingdom:
The graph of the epidemy spread speed
For the purposes of comparing the spread, it is useful to define a parameter which we will call the spread speed of the epidemy.
We will define it as a value:
spreadSpeed = ln(dailyCasesToday/dailyCasesYesterday)
This number is a key value to say how much the epidemy is spreading the country. The ratio of cases today vs. cases yesterday somehow relates to the reproduction number
of the virus. In fact the ratio corresponds to the reproduction number of a theoretical epidemy, where all cases are infected exactly next day when the primary contact
was infected. In reality, of course, most infections happens later than the first day causing the ratio and the reproduction number does not match. But they still
relates and the connection between both parameters is never arbitrary. This always holds: If the ratio is (long term) greater than 1, even the reproduction number is
greater than 1. If the ratio is (long term) equal 1, the reproduction number is also equal 1. And if the ratio is smaller than 1 (long term), even the reproduction number is
smaller than one. I expect (but without a mathematical proof) that at least approximately the reproduction number will be some (fractional) power of the ratio where the
value of the exponent depends on the characteristics of the epidemy. But I don't insist on the fact, that one value is the power of the second one. Maybe the relation
between both parameters is more complicated.
The ratio of today daily cases and yesterday daily cases is moreover logarithmed. The logarithm does the following:
-
It "linearizes" the values, i.e. it moves from the exponential non-intuitive numbers to numbers, which are much better readable. Mathematically it makes sense
to perform any linear operations, like averaging the values. It makes sense to ask about weekly averages etc. If it wouldn't be logarithmed, it would make no
sense doing such operations.
-
From the threshold value of the ratio 1 it moves to the threshold value 0 of the spread speed. If the spread speed is greater than 0, the epidemy grows, if the
spread speed is exactly 0, the epidemy holds its speed, if the spread speed is lower than 0, the epidemy slows down.
-
There is a symmetry between positive and negative numbers. If one day the value rises to some value while next day the value falls to the equal negative value,
it means, that the speeds were compensated. The sum of daily spread speeds for some time period corresponds to the spread speed for the whole period.
Because daily cases tends to significant statistical deviations, it makes sense to average the daily cases for a couple of days. In the graphs, typically the cummulative
moving average for 3 days is used.
The spread speed is a parameter, which does not depend on the "starting position". The spread speed is influenced only by parameters related how fast the epidemy
is spreading the country. Moreover it holds, that if we know the "starting position" and the spread speed, we are able to reconstruct the progress of the whole epidemy.
The following graph shows different variants of the spread speed for the Czech Republic. The variants differ in the length of the window for the cummulative moving average.
In the graph, 3 curves are drawn: no average (blue points), 3 day average (orange line), 10 day average (green line).
The daily ratio graph of two curves
As the term daily ratio graph of two curves, we will understand a graph where at each day we will put the value corresponding to the ratio of both curves. The sense of such
a graph is to monitor, if no systematic deviation happens in case the spread speed curves looks very similar. In this graph there is always one country the reference country
which is not visible in the graph and all other countries are compared to this reference country. In general, it could be said:
-
If the values are constant for some period, the situation in the compared country and in the reference country is proceeding identically. The epidemy is spreading with
the same speed in both countries.
-
If the values are growing, the situation in the compared country is relatively getting worse than the situation in the reference country.
-
If the values are falling, the situation in the compared country is relatively getting better than the situation in the reference country.
Relatively getting worse or better needs to be taken carefully. There may be more reasons for getting relative worse or better. For example, just a relative time shift between
the epidemy waves would cause such observations. Or even the methodology of data collection may cause such improvements or deteriorations. It may be said, that the predictive
value is only in the information if the curve of values is constant. If the curve is growing or falling, a deeper analysis is necessary, to find the root causes of such a situation.
The ratio graph may be calculated not only from the daily cases curves but also from the total cases curves. It is an interesting observation, that for the same country,
both ratio graphs tends to show the same trends. Most probably, this is related to the exponential character of the epidemy spread. Anyway, the daily caess ratio graph
has a bigger predictive value.
Because in this graph the predictive value depends only on how the curve grows or falls but the absolute values are unimportand, we may rescale the curve so that its average
is exactly 1. Even a logarithm may be applied, which removes the non-linearities. The growth then may look more intuitive.
Example: Ratio graphs of daily cases and total cases for the Czech Republic as the reference country and Germany as the compared country follows. Both graphs are normalized
(scaled to mean value of 1) and logarithm is applied.
The graph of the sum of daily cases
If we want to ephasize some local anomaly which is occuring in multiple countries, it may make sense to sum up the daily cases of all countries together and then work
with the cluster of all countries like they would be just a single big country. From the summed daily cases curve, also the epidemy spread speed may be calculated
for the whole country cluster.
Example: Effect of friday the 13th in the countries of western Europe (here the sum graph of countries: Belgium, Finland, Island, Italy, Germany, Netherlands, Norway,
Portugal, Austria, Spain, Sweden, Switzerland):