Covid-19 Mortality Divergence: Data
and Conjecture
Why is this Article
Written?
I am not a statistician and the scope of this article in
statistical terms is beyond my capability. However I believe I can be helpful
in asking the questions that others can pick up on and develop further. As such
this article is an attempt to draw attention to important yet nonetheless out
under the sun issues regarding the Coronavirus episode. Its primary aim is to
help our understanding by fostering analysis and deeper thinking about the
issues at hand.
The Question: Why
such a big divergence in mortality?
Coronavirus world level data shows big divergences in number
of deaths among countries. There are the likes of Belgium, France, Italy,
Netherlands, Spain and United Kingdom where mortality from the confirmed cases
are 10% and upwards. At the same time; Canada, Germany, South Korea,
Switzerland, United States and Turkey report 4% and below mortality. Data
quality and stage of breakout in each country notwithstanding, the divergence
is too great to ignore. What could be causing this divergence?
Potential Answers:
Focusing on available data with testing, cases and deaths
Data quality and stage of breakout are both potentially
important factors but given the size of the data and the length of the
breakout, a sufficiently large sample size might negate analytical problems
with them.
A reasonable answer is that higher mortality is reported by
countries who do not test extensively. This is a tricky issue as extensive
testing is not only related to numbers but also to sample distribution. While a
high number of tests is more likely to indicate a wider country level
distribution, it does not necessarily have to be so. In addition, number of
tests does not mean number of individuals tested. Indeed the total test size
might include multiple tests for same people including repeated tests for
healthcare workers lacking any qualitative data to smooth for this factors,
number of tests was used as the main indicator for coverage. Prior to data analysis,
my hypothesis was that those countries with higher mortality rates would have
(i) less tests per 1M population and (ii) higher confirmed cases per test. The
first implying relatively limited coverage and the second showing focus on more
symptomatic cases (and potential overextension of healthcare system due to high
number of cases).
Another answer was the quality of the healthcare system and
yet another was the effect of preventative measures on the overall progress of
the epidemic in that country. Higher mortality question has an answer that
probably is a composite of these and other reasons (genetics, demography and
such) but for the purposes of this article, I’ll cover the testing, cases and
deaths angle to see if it helps to tackle the question better.
Methodology
For France, Germany and Sweden, I took the test figures from
April 8 and extrapolated quite liberally to April 9. Given the short duration,
I believe my extrapolation should not cause a meaningful statistical
divergence.
I took a sample of 28 countries that have a cumulative
population of over 2.7 billion people. I left out China as its testing data and
its stage in the epidemic was not compatible with most others in the sample.
I’ve used 4 metrics for the analysis: Population, Tests,
Confirmed Cased, and Deaths. From these main data points I’ve looked at the
following derivatives:
·
(a) Case / Test ratio: Is there a meaningful
average among countries?
·
(b) Deaths / Case ratio: There should be a
closer relationship here than (a).
·
(c) Deaths / Test ratio: Does this make any
sense? If so, what sense?
·
(d) Tests per 1M: This is a coverage check for
the country.
·
(e) Cases per 1M: This is an infection spread
check for the country. Its relation to coverage could be meaningful.
·
(f) Deaths per 1M: This is to compare countries
on the population metric.
·
(g) Tests / Population: To verify as a
percentage how much of the population (given reservation in Potential Answers
section) was covered.
·
(h) Cases / Population: Same as (e) but to see
spread as percentage point.
·
Average, Median and Standard Deviations for
these results.
·
Ratio of Standard Deviation to Average to see
which factor has a bigger divergence.
Results Table
Observation – 1: Case
per Test Relationship
On average 10.65% of tests yield a positive result – a
Covid-19 case. However data among countries differ widely from Bahrain’s 1.49%
to France’s 32.92%. More importantly there is a clustering effect here. Once
the average is breached upwards as is with Switzerland, then the figures are
noticeably higher on the higher ratio countries. One explanation could be that,
these countries above the average are testing patients with symptoms that are
more likely to be infected. Let’s look at this when we examine if there is a
correlation between these countries and the number of tests they’ve done in
relation to their population which different from the lower Case / Test ratio
countries here (*Follow-Up Point 1*).
Observation – 2: Deaths
per Case Relationship
On average 3.97% is the mortality rate from confirmed cases.
However, similar to Case / Test statistic, there is a wide difference among
countries and cluster effects persist with a group of countries scoring very
low or very high on this matter.
As we have the Case / Test ratio as Observation – 1, if the
Follow-Up Point 1 was valid then the countries with higher Case / Test ratio
should be the ones with higher Death / Case ratio. This would strengthen the
fact that these are countries with healthcare systems under strain (along with
possible other factors that limit their testing dispersion).

Of the 10 countries with Case / Test averages above the
sample average, 7 have Death / Case averages that strengthen that these
countries are under the strain of incoming patients (*Follow-Up Point 1*). But
3 out of 10 divergence with Switzerland, Turkey and the US is still worth
investigating. I would speculate that Turkey and the US examples are due to
their relative lag to the other countries in the sample. This lag could be
helping them with better treatment options. Switzerland with its older
population and Central European position that probably clocks a similar time
needs more explanation. Perhaps its data will converge or perhaps its resources
are greater to combat the deaths. Furthermore, Turkey and the US data in to the
next couple of weeks is worth observing on this metric.
Finally please note that the ratio of Standard Deviation to
Average on this one is greater than Case / Test ratio. As you’d see in
Observation 3 – the Standard Deviation to Average ratio would increase further
on Deaths / Test metric.
Observation – 3: Deaths
per Test Relationship
On average mortality is 0.69% of tests made. Similar to Case
per Test and Deaths per Case observations. There is a visible clustering effect
here. As this is a derivative of the first two observations that uses two
variables which were covered in Observations 1 and 2, let us check the validity
of this expectation with a table.
Deaths per Test ratio seems to point out to countries which
have relatively a bigger problem dealing with this pandemic. Note that
Switzerland, Turkey and the US drops from this list as well from the Death /
Case list. The countries on this list are the ones that have trouble with containing
their death figures.
Please note that the analysis is based on the sample as of
April 9. Spain for example would possibly on this list but its data was not
readily available in the sources I used.
Finally note that Standard Deviation to Average increases
further on this metric. I would speculate that the increasing magnitude of this
metric from Case/Test to Death/Case to Death/Test shows the divergence of
epidemic progression in different countries. While the difference is observable
in the case confirmations, it is amplified in how the discovered cases fare
afterwards. For further analysis, there is a follow-up point here as to the
validity of some countries doing a lot better in dealing with coronavirus.
Observation – 4: Coverage
Ratios
Coverage Ratios deal with how much of the population was
tested and what were the results per 1M of population or as a percentage of the
total population (whichever method is easier to look at).
On average 9,440 test per 1 million population has been
conducted as of April 9 with our sample set. This corresponds to testing only
0.94% of the total population. However given multiple tests, actual number of
population tested would be below this figure.
On average 750 cases have been confirmed for each 1 million
of population corresponding to 0.07% of the corresponding populations confirmed
to be infected.
The averages look pretty low and the distribution is a bit
different from other observations. Some countries such as Bahrain, Norway,
Estonia, Switzerland, Germany, Italy and Austria among them have done extensive
testing. Most of these countries have fared better except for Italy in the
number of deaths. This is similar to the others in the top of testing list. I
believe this is beyond coincidence and there
is a clear positive link between number of tests conducted (as a percentage of
population) and the success of a country dealing with its epidemic.
There is a subset of countries that are more than 1/3 below
the average in Tests per 1M ratio. These are Malaysia, Costa Rica, Ecuador,
Japan, India and Indonesia. All these countries have below average Cases per 1M
ratio. This reaffirms that only with
adequate testing, cases are discovered – once again underscoring the importance
of extensive testing. Note that of these significantly below average
testers, Ecuador and Indonesia have statistically high Case/Test and Death/Case
ratios that hint at their inability to detect the extensiveness of the problem
in their respective countries. However, on a contrarian note; Costa Rica and
Japan have done limited testing and faring well. There is another follow-up
point here regarding those two.


Please recall that in Observations 1 and 2, I had a
follow-up point contained within this article that was: One explanation could be that, these countries above the average are
testing patients with symptoms that are more likely to be infected. Let’s look
at this when we examine if there is a correlation between these countries and
the number of tests they’ve done in relation to their population which
different from the lower Case / Test ratio countries here (*Follow-Up Point
1*).
I was basically asking whether the high Case per Test ratio
was due to lower testing coverage for those countries. Let’s tabulate the
answer with data from Observation-4.

Of the 10 countries that had higher than average Case / Test
ratio, 7 of them have lower than Test / Population ratio. An observation that
needs further verification is that these 7 countries are dealing with incoming
patients that skew their positive test results to the upside. This on one hand
is positive for their future progress but it can also hint at potential bottlenecks
for their healthcare systems. For the
third time, this derivative analysis shows the importance of increasing testing
coverage. To this end, Turkey and the US following the date of this
data set have increased their daily testing which in my opinion is a step in
the right direction. As for Italy and Belgium that have higher test coverage
but still have faced tough times in dealing with their cases, I raise another
follow-up point for future researchers. They might have suffered under the
onslaught of rapidly increasing infections that needed hospitalization and
their testing might not have caught up in time. The answer is not in this data
covered by this article.

Observation – 5: Those
with the Best Ratios
This article focuses on the problems with the intention of
being helpful. While the data is evident in the tables and charts, we can also
learn from those countries that were more successful so far. To my surprise,
they come from all around the globe. I have defined the success metric as
follows: Those countries with below average deaths per 1 million of their
population and above average tests per 1 million of their population.
Ranked in order of less deaths, the countries below have
tested extensively and prevented deaths. A cursory glance implies all have high
income levels but I leave it up for another follow-up point to pinpoint
similarities among this sample and comparisons with others in the data set.
Conclusion
It is my sincere wish that this article contributes to
thinking about the problem at hand.
I am 100% confident that we will manage to deal with the
virus problem. This cycle will peak in April and second half of May will feel
somewhat better.
We will have to learn to live with coronavirus for some more time and the follow-up cycles will test us. To
that end, the problem is less to do with the virus than in our response to it.
I hope that we learn from each other, co-operate and improve our methods in
dealing with it. Success is inevitable but the cost of success is to be determined by
our methods.
April 11, 2020