Sunday, April 12, 2020

Covid-19 Mortality Divergence: Data and Conjecture


Covid-19 Mortality Divergence: Data and Conjecture

Why is this Article Written?
I am not a statistician and the scope of this article in statistical terms is beyond my capability. However I believe I can be helpful in asking the questions that others can pick up on and develop further. As such this article is an attempt to draw attention to important yet nonetheless out under the sun issues regarding the Coronavirus episode. Its primary aim is to help our understanding by fostering analysis and deeper thinking about the issues at hand.

The Question: Why such a big divergence in mortality?
Coronavirus world level data shows big divergences in number of deaths among countries. There are the likes of Belgium, France, Italy, Netherlands, Spain and United Kingdom where mortality from the confirmed cases are 10% and upwards. At the same time; Canada, Germany, South Korea, Switzerland, United States and Turkey report 4% and below mortality. Data quality and stage of breakout in each country notwithstanding, the divergence is too great to ignore. What could be causing this divergence?

Potential Answers: Focusing on available data with testing, cases and deaths
Data quality and stage of breakout are both potentially important factors but given the size of the data and the length of the breakout, a sufficiently large sample size might negate analytical problems with them.

A reasonable answer is that higher mortality is reported by countries who do not test extensively. This is a tricky issue as extensive testing is not only related to numbers but also to sample distribution. While a high number of tests is more likely to indicate a wider country level distribution, it does not necessarily have to be so. In addition, number of tests does not mean number of individuals tested. Indeed the total test size might include multiple tests for same people including repeated tests for healthcare workers lacking any qualitative data to smooth for this factors, number of tests was used as the main indicator for coverage. Prior to data analysis, my hypothesis was that those countries with higher mortality rates would have (i) less tests per 1M population and (ii) higher confirmed cases per test. The first implying relatively limited coverage and the second showing focus on more symptomatic cases (and potential overextension of healthcare system due to high number of cases).

Another answer was the quality of the healthcare system and yet another was the effect of preventative measures on the overall progress of the epidemic in that country. Higher mortality question has an answer that probably is a composite of these and other reasons (genetics, demography and such) but for the purposes of this article, I’ll cover the testing, cases and deaths angle to see if it helps to tackle the question better.


Methodology
I wanted the work to be as timely as possible but April 9 figures was the most up to date I can get with enough of a sample. To this end, I extensively used the data from Our World in Data https://ourworldindata.org/coronavirus along with Statista https://www.statista.com/statistics/1028731/covid19-tests-select-countries-worldwide/ and Worldometer for population figures https://www.worldometers.info/world-population/population-by-country/.
For France, Germany and Sweden, I took the test figures from April 8 and extrapolated quite liberally to April 9. Given the short duration, I believe my extrapolation should not cause a meaningful statistical divergence.
I took a sample of 28 countries that have a cumulative population of over 2.7 billion people. I left out China as its testing data and its stage in the epidemic was not compatible with most others in the sample.
I’ve used 4 metrics for the analysis: Population, Tests, Confirmed Cased, and Deaths. From these main data points I’ve looked at the following derivatives:
·         (a) Case / Test ratio: Is there a meaningful average among countries?
·         (b) Deaths / Case ratio: There should be a closer relationship here than (a).
·         (c) Deaths / Test ratio: Does this make any sense? If so, what sense?
·         (d) Tests per 1M: This is a coverage check for the country.
·         (e) Cases per 1M: This is an infection spread check for the country. Its relation to coverage could be meaningful.
·         (f) Deaths per 1M: This is to compare countries on the population metric.
·         (g) Tests / Population: To verify as a percentage how much of the population (given reservation in Potential Answers section) was covered.
·         (h) Cases / Population: Same as (e) but to see spread as percentage point.
·         Average, Median and Standard Deviations for these results.
·         Ratio of Standard Deviation to Average to see which factor has a bigger divergence.


Results Table






Observation – 1: Case per Test Relationship
On average 10.65% of tests yield a positive result – a Covid-19 case. However data among countries differ widely from Bahrain’s 1.49% to France’s 32.92%. More importantly there is a clustering effect here. Once the average is breached upwards as is with Switzerland, then the figures are noticeably higher on the higher ratio countries. One explanation could be that, these countries above the average are testing patients with symptoms that are more likely to be infected. Let’s look at this when we examine if there is a correlation between these countries and the number of tests they’ve done in relation to their population which different from the lower Case / Test ratio countries here (*Follow-Up Point 1*).




 Observation – 2: Deaths per Case Relationship
On average 3.97% is the mortality rate from confirmed cases. However, similar to Case / Test statistic, there is a wide difference among countries and cluster effects persist with a group of countries scoring very low or very high on this matter.

As we have the Case / Test ratio as Observation – 1, if the Follow-Up Point 1 was valid then the countries with higher Case / Test ratio should be the ones with higher Death / Case ratio. This would strengthen the fact that these are countries with healthcare systems under strain (along with possible other factors that limit their testing dispersion).




Of the 10 countries with Case / Test averages above the sample average, 7 have Death / Case averages that strengthen that these countries are under the strain of incoming patients (*Follow-Up Point 1*). But 3 out of 10 divergence with Switzerland, Turkey and the US is still worth investigating. I would speculate that Turkey and the US examples are due to their relative lag to the other countries in the sample. This lag could be helping them with better treatment options. Switzerland with its older population and Central European position that probably clocks a similar time needs more explanation. Perhaps its data will converge or perhaps its resources are greater to combat the deaths. Furthermore, Turkey and the US data in to the next couple of weeks is worth observing on this metric.

Finally please note that the ratio of Standard Deviation to Average on this one is greater than Case / Test ratio. As you’d see in Observation 3 – the Standard Deviation to Average ratio would increase further on Deaths / Test metric.

 



Observation – 3: Deaths per Test Relationship
On average mortality is 0.69% of tests made. Similar to Case per Test and Deaths per Case observations. There is a visible clustering effect here. As this is a derivative of the first two observations that uses two variables which were covered in Observations 1 and 2, let us check the validity of this expectation with a table.





Deaths per Test ratio seems to point out to countries which have relatively a bigger problem dealing with this pandemic. Note that Switzerland, Turkey and the US drops from this list as well from the Death / Case list. The countries on this list are the ones that have trouble with containing their death figures.

Please note that the analysis is based on the sample as of April 9. Spain for example would possibly on this list but its data was not readily available in the sources I used.

Finally note that Standard Deviation to Average increases further on this metric. I would speculate that the increasing magnitude of this metric from Case/Test to Death/Case to Death/Test shows the divergence of epidemic progression in different countries. While the difference is observable in the case confirmations, it is amplified in how the discovered cases fare afterwards. For further analysis, there is a follow-up point here as to the validity of some countries doing a lot better in dealing with coronavirus.

 




Observation – 4: Coverage Ratios
Coverage Ratios deal with how much of the population was tested and what were the results per 1M of population or as a percentage of the total population (whichever method is easier to look at).
On average 9,440 test per 1 million population has been conducted as of April 9 with our sample set. This corresponds to testing only 0.94% of the total population. However given multiple tests, actual number of population tested would be below this figure.

On average 750 cases have been confirmed for each 1 million of population corresponding to 0.07% of the corresponding populations confirmed to be infected.

The averages look pretty low and the distribution is a bit different from other observations. Some countries such as Bahrain, Norway, Estonia, Switzerland, Germany, Italy and Austria among them have done extensive testing. Most of these countries have fared better except for Italy in the number of deaths. This is similar to the others in the top of testing list. I believe this is beyond coincidence and there is a clear positive link between number of tests conducted (as a percentage of population) and the success of a country dealing with its epidemic.

There is a subset of countries that are more than 1/3 below the average in Tests per 1M ratio. These are Malaysia, Costa Rica, Ecuador, Japan, India and Indonesia. All these countries have below average Cases per 1M ratio. This reaffirms that only with adequate testing, cases are discovered – once again underscoring the importance of extensive testing. Note that of these significantly below average testers, Ecuador and Indonesia have statistically high Case/Test and Death/Case ratios that hint at their inability to detect the extensiveness of the problem in their respective countries. However, on a contrarian note; Costa Rica and Japan have done limited testing and faring well. There is another follow-up point here regarding those two.





Please recall that in Observations 1 and 2, I had a follow-up point contained within this article that was: One explanation could be that, these countries above the average are testing patients with symptoms that are more likely to be infected. Let’s look at this when we examine if there is a correlation between these countries and the number of tests they’ve done in relation to their population which different from the lower Case / Test ratio countries here (*Follow-Up Point 1*).
I was basically asking whether the high Case per Test ratio was due to lower testing coverage for those countries. Let’s tabulate the answer with data from Observation-4.


 

Of the 10 countries that had higher than average Case / Test ratio, 7 of them have lower than Test / Population ratio. An observation that needs further verification is that these 7 countries are dealing with incoming patients that skew their positive test results to the upside. This on one hand is positive for their future progress but it can also hint at potential bottlenecks for their healthcare systems. For the third time, this derivative analysis shows the importance of increasing testing coverage. To this end, Turkey and the US following the date of this data set have increased their daily testing which in my opinion is a step in the right direction. As for Italy and Belgium that have higher test coverage but still have faced tough times in dealing with their cases, I raise another follow-up point for future researchers. They might have suffered under the onslaught of rapidly increasing infections that needed hospitalization and their testing might not have caught up in time. The answer is not in this data covered by this article.


Observation – 5: Those with the Best Ratios
This article focuses on the problems with the intention of being helpful. While the data is evident in the tables and charts, we can also learn from those countries that were more successful so far. To my surprise, they come from all around the globe. I have defined the success metric as follows: Those countries with below average deaths per 1 million of their population and above average tests per 1 million of their population.

Ranked in order of less deaths, the countries below have tested extensively and prevented deaths. A cursory glance implies all have high income levels but I leave it up for another follow-up point to pinpoint similarities among this sample and comparisons with others in the data set.

 



Conclusion
It is my sincere wish that this article contributes to thinking about the problem at hand.

I am 100% confident that we will manage to deal with the virus problem. This cycle will peak in April and second half of May will feel somewhat better.

We will have to learn to live with coronavirus for some more time and  the follow-up cycles will test us. To that end, the problem is less to do with the virus than in our response to it. I hope that we learn from each other, co-operate and improve our methods in dealing with it. Success is inevitable but the cost of success is to be determined by our methods.

April 11, 2020

No comments:

Post a Comment

Coronavirus Active Cases Graphs: Different Paths for Different Countries

Coronavirus Active Cases Graphs: Different Paths for Different Countries All the data in this article is from - https://www.worldomete...