covid-sim
/home/abuild/rpmbuild/BUILD/covid-sim-0.14.0/docs/inputs-and-outputs.md
1 # The inputs and outputs of the `CovidSim` model
2 
3 This is WIP. Know something not documented here? Please add and open a PR!
4 
5 - Table of contents
6  - [The geography](#the-geography)
7  - [Main command-line arguments](#main-command-line-arguments)
8  - [Input files](#input-files)
9  - [Parameters](#parameters)
10  - [Parameter files](#parameter-files)
11  - [Population density file](#population-density-file)
12  - [How population density files are produced](#how-population-density-files-are-produced)
13  - [School files](#school-files)
14  - [Output files](#output-files)
15  - [R summary visualisations](#r-summary-visualisations)
16 
17 ## The geography
18 
19 `CovidSim` simulates disease spread in a geographical region, which in principle
20 can be at any scale, but in practice is a region or country.
21 
22 In consequence, the model must be told the geography of a region, such as its
23 population density, plus other specific information. This information is
24 specified as a mixture of parameters and input population density files.
25 
26 ## Main command-line arguments
27 
28 A typical run specifies:
29 
30 1. Files that contain simulation parameters (the `/A`, `/P` and `/PP` options)
31 2. A population density file for the country we're simulating (the `/D` option)
32 3. The name of output files that summarise the results of the simulation (the `/O` option).
33 
34 ```shell
35 CovidSim
36  /O:OutputFilesPrefix
37  /P:ParameterFile
38  [/NR:NumberOfRealisations]
39  [/A:AdminParamFile]
40  [/AP:AirTravelFile]
41  [/c:NumThreads]
42  [/C:PlaceCloseIndepThresh]
43  [/CLP[1-6]:ParamOverrideNumber]
44  [/d:RegionalDemographyFile]
45  [/D:PopulationDensityFile]
46  [/I:InterventionFile]
47  [/KO:KernelOffsetScale]
48  [/KP:KernelPowerScale]
49  [/L:NetworkFileToLoad]
50  [/LS:SnapshotLoadFile]
51  [/M:OutputDensityFile]
52  [/PP:PreParameterFile]
53  [/R:R0scaling]
54  [/s:SchoolFile]
55  [/S:NetworkFileToSave]
56  [/T:PreControlClusterIdCaseThreshold]
57  SetupSeed1 SetupSeed2 RunSeed1 RunSeed2
58 ```
59 Required arguments:
60 
61 - `/O` - Output file path prefix for simulation data collection. Output file
62  names have the `.xls` extension but use tabular `tsv` data.
63  - Example: `/O:./output/NoInt_R0=1`
64 - `/P` - Intervention parameters for a specific run.
65  - Example: `/P:./data/param_files/p_NoInt.txt`
66 - `SetupSeed1 SetupSeed2` - Random number generator seeds used when initialising
67  the model, including creating the network file (large positive integers).
68 - `RunSeed1 RunSeed2` Random number generator seeds used when running the model.
69  These can be varied to do multiple runs with the same network file
70  (large positive integers).
71 
72 Optional Arguments:
73 
74 - `/NR` - specified the number of simulation realisations (independent runs with the same
75 parameters) to run at once and average over in the output files.
76 - `/A` - [Administrative division](./glossary.md#Administrative\ Division) parameter file
77  - Example: `/A:./data/admin_units/United_Kingdom_admin.txt`
78 - `/AP` Air travel data for a specific geography (unused currently)
79 - `/BM:format`. Specifies the output bitmap format. Valid choices are `BMP`, or (when
80  available) `PNG` - Default is `PNG` if available, otherwise `BMP`.
81 - `/c` - Number of parallel threads to use (only used if compiled with OpenMP)
82  - Example: `/c:32`
83 - `/C` - Sets the `P.PlaceCloseIndepThresh` parameter.
84 - `/CLP[1-6]` - Special parameters that interact with wildcards `#1`, `#2`, etc.
85  in the intervention parameter file (and less often the pre-parameter file).
86  Wildcard `#n` is replaced by the value of `CLPn`. This is useful to vary parts
87  of parameter files at run-time (e.g. to undertake sensitivity analysis)
88  without needing to generate entirely new parameter files.
89  - Examples: `/CLP1:100000` & `/CLP2:0`
90 - `/d` - Regional demography file to use.
91 - `/D` - Population density file for a specific geography (e.g. a country). Can
92  be loaded from either the original textual format or a binary format from
93  a previous run that used the `/M` option.
94  - Examples: `/D:./data/populations/wpop_eur.txt` & `/D:./US_LS2018.bin`
95 - `/I` - Intervention file. Can be specified more than once.
96 - `/KO` - Scales the `P.MoveKernelScale` parameter.
97 - `/KP` - Scales the `P.MoveKernelShape` parameter.
98 - `/L` - Load a network file saved from a previous run that specified `/S`.
99  - Example: `/L:./network_file.bin`
100 - `/LS` - Load a snapshot file saved by the `/SS` command.
101  - Example: `/LS:./snapshot.bin`
102 - `/M` - Output a population density file to disk
103  - Example: `/M:./US_LS2018.bin`
104 - `/PP` - Transmission and calibration parameter files for a specific run
105  - Example: `/PP:./data/param_files/preUS_R0=2.0.txt`
106 - `/R`. Specifies the basic reproduction number [R0](./glossary.md#R0), as a
107  multiplier of 2. This command-line parameter is read into `P.R0scaling` which
108  scales the R0 parameter specified in the parameter file. This is useful when
109  repeating simulations that *only* vary `R0`). For COVID-19, 1.4 to 1.6 is suitable.
110  - Example: `/R:1.6`
111 - `/s` - School information for a specific geography (currently only used for US).
112  - Example: `/s:./data/populations/USschools.txt`
113 - `/S` - For efficiency, we can run and, as a side-effect, generate a
114  [network file](./model-glossary.md#Network-file) that assigns
115  [people](./model-glossary.md#People) to [places](./model-glossary.md#Places).
116  It may then be re-used for subsequent runs with different input parameters for
117  the same geography. ***Note***: this file is non-portable
118  - Example: `/S:./network_file.bin`
119 - `/SS` - Specifies the file and interval at which to save a snapshot when
120  running a simulation. The first argument is the number of `P.TimeStep`s that
121  should elapse before saving. The second argument is the file to save snapshots
122  at.
123  - Example: `/SS:100,./snapshot.bin`
124 - `/T` - Sets the `P.PreControlClusterIdCaseThreshold` parameter.
125 
126 ## Input files
127 
128 The main inputs files are parameter files and population density files
129 (for specific geographies).
130 
131 ### Parameters
132 
133 There are a very large number of parameters to `CovidSim`. This repo is
134 undergoing active development and rationalisation. The parameters are currently
135 not self-documenting.
136 
137 Parameter values are read in from parameter files by function `ReadParams`,
138 which matches up a parameter description string to the according variable in the
139 source code. The only method to determine the precise meaning of a specific
140 parameter is to read the code.
141 
142 ### Parameter files
143 
144 The parameters are specified in admin, pre-parameter and intervention parameter
145 files. Both files have the same format.
146 
147 Admin and pre-parameter files contain parameters whose values are *common* to a
148 series of runs (i.e. defining geographies and transmission parameters).
149 Intervention Parameter files group intervention parameters whose values are more
150 likely to *differ* between a series of runs.
151 
152 The format is a sequence of:
153 
154 ```
155 [Description of Parameter]
156 value
157 ```
158 
159 If you see multiple numbers below the parameter description, then disregard them.
160 The simulation uses only the numbers immediately below the parameter description.
161 
162 An example parameter file is `./data/param_files/p_NoInt.txt`.
163 
164 ### Population density file
165 
166 A binary geography-specific file used to assign people to cells. Currently these
167 files are generated and provided by Imperial College.
168 
169 An example population density file is `./data/populations/wpop_eur.txt`.
170 
171 The information contained in this file includes:
172 
173 | longitude | latitude | number of people | country code | admin unit code |
174 |--|-:|-:|-:|-:|
175 | -156.68333 | 71.325| 30| 46 | 4602017|
176 |-156.76666| 71.3 | 1 |46 | 4602017|
177 | ... | ... | ... | ...| ... |
178 
179 #### How population density files are produced
180 
181 Physical geography data: each geography has a shape file (`.shp`) of polygons
182 and meta-data (`.dbf`) with GPS coordinates. Admin units are a set of polygons.
183 
184 Human geography data specifies where people live on the same scale as a
185 `CovidSim`'s [microcell](./model-glossary.md#Microcells) (1/120th of a degree).
186 
187 Imperial College combines the physical and human data to calculate population
188 densities per polygon. This process produces the population density file.
189 
190 A companion to the population density file is a meta-file that maps admin unit
191 codes to string descriptions (e.g., codes to US state names).
192 
193 ### School files
194 
195 The first line of a school file has (1 + 2`n`) integer values, where `n` is
196 the number of school types. The values are:
197 
198 - Index `0`: The number of types of schools. E.g. a geography might two school
199  place types (primary and secondary).
200 - Index 1 + 2`i`: The total number of schools of type `i`
201 - Index 2 + 2`i`: The number of age bands in schools of type `i`
202 
203 E.g., if a geography has 2 school types then the first line of the school file
204 might be:
205 
206 `2 100 3 50 4`
207 
208 representing 2 school types, with 100 of type 0 (which as 3 age classes) and 50
209 of type 1 (which has 4 age classes).
210 
211 The remainder of the file has a row per school. E.g.:
212 
213 | longitude | latitude | place type index | #people in the school | #people in age band 1| # people in age band 2 | ... | # people in age band n |
214 |-:|-:|-:|-:|-:|-:|-:|-:|
215 | -156.68333 | 71.325| 0 | 80 | 30 | 46 | ... | 4 |
216 | -123.32 | 70.35 | 0 | 32 | 23 | 3 | ... | 6 |
217 | ... | ... | ... | ... | ... | ... | ... | ... |
218 
219 The place type index for schools is `0`.
220 
221 ## Output files
222 
223 Simulation output files are produced by each run.
224 Switches in parameter files can control the precise nature of the outputs
225 (e.g., at country level, or at admin unit level, or both etc.). E.g.
226 
227 ```
228 [Do Severity Analysis]
229 1
230 ```
231 
232 then `severity.xls` is generated.
233 
234 A run is extinct if the disease dies out, otherwise a run is non extinct.
235 
236 Outputs can be averaged over all extinct (`avE` suffix) and non-extinct
237 (`avNE` suffix) runs. Currently, we are simulating large epidemics that
238 essentially become deterministic and therefore we focus on `avNE` files.
239 
240 We pay most attention to `avNE` (average of non-extinct realisations) files.
241 
242 Below is an incomplete specification of the output file formats.
243 
244 ### `name.avNE.xls`
245 
246 Contains time-stamped (e.g., daily) statistics for the simulation over the whole country.
247 
248 | column | meaning |
249 | ------------- |-------------:|
250 | t | sample time – specified in the preparam file by Sampling timestep - generally day in 2020 (t=1 -> Jan 1) |
251 | S | total number of susceptibles in the population |
252 | L | total number of latently infected people in the population |
253 | I | total number of infectious people in the population |
254 | R | total number of recovered people in the population |
255 | D | total number of deaths in the population |
256 | incI | incidence of infections at that timestep |
257 | incR | incidence of recoveries |
258 | incFC | incidence of false cases, i.e. false positives |
259 | incC | incidence of cases |
260 | incDC | incidence of detected cases |
261 | incTC | incidence of treated cases |
262 | incH | incidence of hospitalisations – again, probably can ignore this as was written specifically for the Ebola model and we’re using a different approach here. |
263 | cumT | cumulative number of treated cases |
264 | cumTmax | the maximum number of cumulative treated cases from the runs being averaged over |
265 | cumTP | cumulative number of privately treated cases |
266 | cumV | cumulative number of vaccinations |
267 | cumVmax | the maximum number of cumulative vaccinations from the runs being averaged over |
268 | Extinct | Is the run extinct or not? |
269 | rmsRad | root mean square radius of infections from seed point |
270 | maxRad | maximum radius of an infection from the seed point |
271 | v* | a sequence of columns containing the variance of the above quantities in the same order (excluding the time step) |
272 | value 1 | Number of non-extinct runs |
273 | value 2 | Number of extinct runs |
274 | value 3 | R0 in households |
275 | value 4 | R0 in places |
276 | value 5 | R0 of spatial transmission |
277 | value 6 | Mean peak height |
278 | value 7 | Variance of peak height |
279 | value 8 | Mean peak time |
280 | value 9 | Variance of peak time |
281 
282 ### `name.avNE.adunit.xls`
283 
284 Contains time-stamped statistics per [admin unit](./model-glossary.md#Admin-unit)
285 (hopefully with headers matching the codes in a population index file).
286 
287 | column | meaning |
288 | ------------- |-------------:|
289 | t | time |
290 | I(admincode) ... | Incidence of infection in each admin unit (the number of columns equals the number of admin units used) |
291 | C(admincode) ... | Incidence of cases in each admin unit. |
292 | DC(admincode) ... | Incidence of detected cases in each admin unit |
293 | T(admincode) ... | Incidence of treated cases in each admin unit |
294 | value ... | A sequence of column values of the population of each admin unit |
295 
296 ### `name.avNE.age.xls`
297 
298 | column | meaning |
299 | ------------- |-------------:|
300 | t | time |
301 | I(age band) ... | incidence of cases in each age band |
302 | C(age band) ... | incidence of critical cases in each age band |
303 | D(age band) ... | incidence of deaths in each age band |
304 
305 ### `name.avNE.severity.xls`
306 
307 Contains statistics on the [prevalence](./model-glossary.md#Prevalence) of the
308 infection.
309 
310 | column | meaning |
311 | ------------- |-------------:|
312 | t | time |
313 | PropSchClosed | proportion of schools closed |
314 | PropSocDist | unknown |
315 | [mild](./model-glossary.md#Mild) | total number of mild cases at time t |
316 | [ILI](./model-glossary.md#ILI) | total number of influenza-like illness cases at time t (assume represents GP demand) |
317 | [SARI](./model-glossary.md#SARI) | total number of severe acute respiratory illness cases at time t (assume represents hospital demand) |
318 | [Crit](./model-glossary.md#Crit) | total number of critical cases (assume represents ICU demand) |
319 | [CritRecov](./model-glossary.md#CritRecovery) | total number of critical cases who are well enough to be out of ICU but still need a hospital bed |
320 | incMild | incidence of mild cases |
321 | incILI | incidence of ILI cases |
322 | incSARI | incidence of SARI cases |
323 | incCrit | incidence of critical cases |
324 | incCritRecov | incidence of critical cases still in hospital but no longer requiring ICU |
325 | incDeath | incidence of death |
326 | cumMild | cumulative number of mild cases |
327 | cumILI | cumulative number of ILI cases |
328 | cumSARI | cumulative number of SARI cases |
329 | cumCrit | cumulative number of critical cases |
330 | cumCritRecov | cumulative number of critical cases still in hospital but no longer requiring ICU |
331 | v* | a sequence of columns containing the variance of the above quantities in the same order (excluding the PropSchClosed, PropSocDist) |
332 
333 ### `name.avNE.severity.adunit.xls`
334 
335 As per `name.avNE.serverity.xls`, excluding PropSchClosed and PropSocDist, and
336 with each quantity listed for each admin unit in turn.
337 
338 <!--
339 ### `name.avNE.adunitVar.xls`
340 
341 ### `name.avNE.controls.xls`
342 
343 ### `name.adunit.xls`
344 
345 ### `name.avNE.country.xls`
346 
347 ### `name.avNE.household.xls`
348 
349 ### `name.avNE.inftype.xls`
350 
351 ### `name.avNE.R0.xls`
352 
353 ### `name.avNE.severity.xls`
354 
355 ### `name.severity.adunit.xls`
356 
357 ### `name.severity.xls`
358 
359 ### `name.xls`
360 -->
361 
362 ## R summary visualisations
363 
364 Some [R scripts](../Rscripts) provide basic visualisations of model runs.
365 
366 If the R software is installed and output files of model runs have been created
367 in folder `folder`, they can be visualised using the commands
368 
369 ```shell
370 Rscript Rscripts/PlotsSpatial.R [folder-where-the-data-is]
371 Rscript Rscripts/CompareScenarios.R [folder-where-the-data-is]
372 ```
373 
374 This will create `.png`s visualising the data in a new subfolder called `Plots`.