Charles Simkins says that data dissemination is essential for informed policy debate
A request for more detailed information about the spread of the Covid-19 epidemic
21 April 2020
INTRODUCTION
We have been watching the updates on COVID-19 released by the Minister of Health since 17 March 2020. Apart from provincial breakdowns of infections, the data contained in them can be reported in a single page (see the Annexure). We think that the level of reporting has become inadequate for an assessment of what we do and do not know about the extent and spread of the epidemic. Critical policy choices depend on such an assessment. This brief makes the case for a more extensive reporting system – a database which can be updated each time new information becomes available[1].
DOES THE PUBLIC NEED TO KNOW?
But, it may be objected, does the public really need more information and is the cost of providing it justified? Can its assessment not be left to the National Command Council, advised by the COVID-19 ministerial advisory committee? Ought the risk of misinterpretation by an untutored public be avoided? The answers in a democracy are yes, no and no. Public policy must be able to withstand critical scrutiny, and scrutiny is not possible if information is hidden. Open debate should be capable of winnowing out poor judgement. And justification engenders consent, in ways the police and the army never can.
-->
THE INTERPRETATION OF DATA
Drawing inferences from data depend on the methods of data collection. Three are relevant here.
1. Most of the data in the Annexure has been collected from self-chosen presentation of individuals to health care providers – general practitioners, private hospitals, government clinics, public hospitals. Individuals may present with symptoms sufficiently serious to require medical attention, or because they have been identified as at risk through contact tracing, or because they are anxious that they may have been exposed to infection. The decision to test will reflect norms of good practice and, particularly in the public sector, specific testing guidelines. The number of infections identified in this way will always be an underestimate of all infections. Some of the infected will not have symptoms severe enough for them to seek medical help. Some will find the health care system impossible to access. And good practice norms and testing guidelines do not mean that all those infected will be tested.
It would be useful to have information on tests conducted and test results for tests initiated in the private sector and the public sector separately. It may be that the proportions of positive tests differ between the sectors, and that these proportions are evolving differently. Since people obtaining services from the public sector are likely to be poorer than those using the private sector, the disaggregated data would be evidence about the impact of income on the progression of the epidemic. Moreover, the existence of a differential proportions means that a changing mix of testing between the public and private sectors would have an impact on discovery of infections.
-->
2. Pro-active mass screening and testing of populations believed to be particularly vulnerable has begun. There is no information about the selection of sites where this testing is taking place or will take place. There also no information about screening protocols, criteria for whether to refer screened persons to health providers, or mechanisms for maximizing the proportion of referred persons reporting to providers. We do not know what information will be reported by screeners and provider, or how this information will be assembled, processed and reported. Mass screening and testing are intended to identify infection hot spots, but we do not know how such areas will be identified and delineated. All these are programme design issues which will affect the interpretation of results from the programme.
3. Additional to the effort to find hot spots will be the need to obtain information to determine which restrictions on economic activity should be lifted. How to do this is the subject of debate globally, but each country will have to find a strategy which suits its circumstances. Priority will have to be given to those activities most essential for economic and social functioning. The information base will be a combination of general epidemiological data, including information form possible new sentinel sites, and work place measures to ensure the safety of workers. Consideration will need to be given to whether and how information from work places should be reported.
A REVISED GEOGRAPHICAL GRID
At present, information is being reported on a provincial basis. This grid will prove too coarse. Metros should be reported on separately and a way found to divide the rest of the country up, possibly according to district municipalities.
-->
The table below indicates the distribution of reported infections across provinces on 15 April. The question arises: is this a reasonable reflection of the spread of the epidemic, or are large swathes of the country terra incognita? A more refined geographical grid would throw more light on the situation.
Provincial distribution of infections (15-Apr-20)
Province
Infections
-->
Population (thousands)
Infections (per million)
Western Cape
657
6844
96
Gauteng
930
15176
61
KwaZulu-Natal
519
11289
46
Free State
97
2887
34
Eastern Cape
199
6712
30
Northern Cape
16
1264
13
North West
23
4027
6
Mpumalanga
22
4592
5
Limpopo
25
5983
4
Unallocated
18
Total
2506
58774
43
CONCLUSION: DETECTIVE WORK, HYPOTHESIS TESTING AND MODEL CONFRONTATION
Formal statistical hypothesis testing can only be used on data from a properly constructed sample, in which design probabilities are known. However, the information we have, and may have in the short run, is not collected on this basis. So, for now, assessment must be based on detective work and relevant epidemiological understanding.
Models of the epidemic have been built round the world and in South Africa. There are limits to models: experience with COVID-19 is limited, models of country epidemics do not always say the same thing, and even short-term projections can prove way off the mark. Models are best used to bring data into a rational relationship, in the process exposing gaps in knowledge. They should be confronted with one another as part of a critical and constructive process to strengthen insights. In particular, the models being used by government should be exposed to scrutiny.
The government is almost certainly right that the worst of the epidemic is yet to come, and it is probable that the lockdown has flattened the curve somewhat. But the question on everyone’s mind is how the epidemic will evolve and how developments will shape the policy options we have. Data dissemination is essential for informed policy debate.
By Charles Simkins, Head of Research, HSF, 21 April 2020
ANNEXURE
Publicly released COVID-19 data
Cumulative infections detected
Deaths
Recoveries
Tests
new
cumulative
Total
Private laboratories
Public laboratories
March
17
85
18
116
19
150
20
202
21
240
22
274
23
402
12815
10803
2012
24
554
25
709
26
927
27
1170
28537
28
1187
29
1280
1
2
30
1307
31
1353
2
5
April
1
1380
2
1462
5
3
1505
2
7
4
1585
9
53937
5
1655
2
11
56783
6
1686
1
12
58098
7
1749
8
1845
63776
9
1934
10
2003
24
410
73028
11
2028
1
25
75053
12
2173
80085
*
*
13
2272
14
2415
87022
15
2506
7
90515
* On 12 April, it was reported that, of 5032 tests conducted in the past day, 3192 were done in public laboratories
[1] An example of such a data base is the Johns Hopkins University COVID-19 dashboard, widely used to follow the spread of the epidemic across the world.