Sensitivity and identifiability analysis of COVID-19 pandemic models

The paper presents the results of sensitivity-based identifiability analysis of the COVID-19 pandemic spread models in the Novosibirsk region using the systems of differential equations and mass balance law. The algorithm is built on the sensitivity matrix analysis using the methods of differential and linear algebra. It allows one to determine the parameters that are the least and most sensitive to data changes to build a regularization for solving an identification problem of the most accurate pandemic spread scenarios in the region. The performed analysis has demonstrated that the virus contagiousness is identifiable from the number of daily confirmed, critical and recovery cases. On the other hand, the predicted proportion of the admitted patients who require a ventilator and the mortality rate are determined much less consistently. It has been shown that building a more realistic forecast requires adding additional information about the process such as the number of daily hospital admissions. In our study, the problems of parameter identification using additional information about the number of daily confirmed, critical and mortality cases in the region were reduced to minimizing the corresponding misfit functions. The minimization problem was solved through the differential evolution method that is widely applied for stochastic global optimization. It has been demonstrated that a more general COVID-19 spread compartmental model consisting of seven ordinary differential equations describes the main trend of the spread and is sensitive to the peaks of confirmed cases but does not qualitatively describe small statistical datasets such as the number of daily critical cases or mortality that can lead to errors in forecasting. A more detailed agent-oriented model has been able to capture statistical data with additional noise to build scenarios of COVID-19 spread in the region.


Introduction
Many mathematical models in biology (epidemiology, im munology, pharmacokinetics, systems biology), medicine (tomography), physics and chemistry (meteorology, chemical kinetics), as well as sociology are described by systems of differential equations, whether they be ordinary (Kermack, McKendrick, 1927), partial (Habtemariam et al., 2008), or stochastic differential ones (Lee et al., 2020). Coefficients in these equations characterize specific features of simulated processes under given conditions. To build an adequate mathe matical model, one needs to refine the coefficients of the equa tions based on the known parameters of the process and any additional information available about it. For example, when considering epidemiological problems, the parameters such as the infection transmission rate in the region; the critical case rates depending on comorbidities, age, and other demo graphic factors; the proportion of asymptomatic carriers/latent infection cases, etc., are unknown or approximately derived based on statistical data. These parameters are often sensitive to the measurements prone to errors (rounding, instrument, and human factor errors), which leads to unstable solutions of parameter identification problems.
Identifiability analysis of the differential equation systems modeling biological, medical, and physical processes is an im portant step to undertake before developing computational algorithms (Bellu et al., 2007;Raue et al., 2010Raue et al., , 2014Miao et al., 2011;Kabanikhin et al., 2016;Voropaeva, Tsgoev, 2019). A classification of identifiability types distinguishing between structural identifiability, practical identifiability, and sensiti vity analysis is presented in (Krivorotko et al., 2020a). The authors also analyze the systems of ordinary differential equa tions (ODE) describing epidemiological and immunological processes in terms of practical identifiability and parameter sensitivity to measurement errors.
A detailed review of methods and case studies of structural identifiability analysis in biological problems described by ODE systems may be found in (Miao et al., 2011;Kabanikhin et al., 2016). The model structure being as follows: The analysis based on this model makes it possible to verify the uniqueness of the solution q of the parameter identification problem and the initial conditions of model x 0 (or their part) based on available measurements f ik , while also providing recommendations on adding new information and modify ing the conditions of the parameter identification problem to ensure the uniqueness of the solution.
In the present paper, the analysis of the semirelative sensi tivity of the mathematical models to describe epidemiological and social processes is presented. This approach, proposed in (Adams et al., 2004) for analyzing ODE systems, shows the degree of parameter sensitivity to measurements and identifies lacking/excessive measurements based on a cer tain re ference set of parameters for solving the stated pa rameter identification problem. Two mathematical models of the spread of the new coronavirus infection caused by the SARSCoV2 virus described by ODE systems are used as examples. A regula rization algorithm for numerical solu tion of the parameter identification problem is developed for SEIR compartmen tal model and agentbased model using the statistical data from public sources. The modeling results and the scenario of COVID19 spread in the Novosibirsk region are presented.

Parameter sensitivity analysis in systems of ordinary differential equations
Sensitivity analysis is used for identifiability assessment of the unknown parameters of the model represented by ODE system (1) before developing a numerical solution algorithm Вавиловский журнал генетики и селекции / Vavilov Journal of Genetics and Breeding • 2021 • 25 • 1 Sensitivity and identifiability analysis of COVID-19 pandemic models for the parameter identification problem. These methods do not require real experimental data, but the number of measure ments and their time may be a necessity. Sensitivity analysis for a mathematical model is performed with regard to a set of nominal parameters q*, whose values are taken from the literature or statistical data available.
Sensitivity analysis methods are based on a sensitivity matrix. Assume that t 1 ≤ t 2 ≤ … ≤ t K are the fixed times of measurements f ik . Then, the sensitivity matrix coefficients for parameter vector q* are calculated as: where, f i , i = 1, …, M , is the i th entry of the measurement function vector, and q j , j = 1, …, L, is the j th entry of the parameter vector. Thus, the sensitivity matrix is determined as follows: .
The sensitivity matrix is calculated using the conventional sensitivity function: When the first equation from (1) is differentiated with re spect to q j , each vector function s q j should satisfy the Cauchy problem as follows: So, s q j (t) is obtained by numerically solving the Cauchy problem. First, the assessment is performed for the parameters q, to which the model's solution is most sensitive. These para meters, in turn, are defined by calculating semirelative sen sitivity. Here, sensitivity is considered a time function on the interval of interest. To obtain a general measure of parameter sensitivity of the solution, a time norm (over space L 2 ) is de rived for each state/parameter combination and the ob tained scalar quantities are ranked to identify the most sensitive pa rameters. The lower the value ∂ f i (t) ∂ q k q* k 2 , the less the effect of q k on f i . This general measure will be referred to as semi relative sensitivity. The orthogonal method is then used for sensitivity analysis. The idea of the method suggested in (Yao et al., 2003) is to investigate linear dependencies of the columns of sensitivity matrix S. In such a way it will be possible to assess parameter sensitivity to the input data and parameter interdependence at the same time.

Sensitivity analysis of COVID-19 spread models
The feature of currently developed COVID19 spread models is that they analyze the behavior of asymptomatic cases and the effect of the incubation period on the epidemiological situation in the regions. Several opensource suites (Gomez et al., 2020;Tuomisto et al., 2020;Wolfram, 2020) (Kermack, McKendrick, 1927). Agents are distributed in time de pending on the assigned transition probabilities between groups such as infection probability, virus contagiousness, mortality rate, etc. 2. Agent-based approach (bottomup modeling) is based on studying the interactions between individuals and their effect on global parameters (e. g. virus contagiousness, mortality rate, severe case probability, etc.). Agentbased models are characterized by random graphs whose arc lengths describe the probabilities of transition to different agent states. The parameters of transition between groups or agent states are often unknown or broadly defined. For instance, the incubation period of the disease according to WHO data varies from 2 to 14 days, which complicates the analysis of the model and the building of adequate disease spread scenarios.
Let us consider two breakdowns of the population of a par ticular age (e. g., ages 20-29) into groups. The transition into different agent states in the course of the disease caused by the SARSCoV2 virus is presented in Fig. 1. These models do not take into account such factors as sex segregation, an nual birth and mortality rates (since the modeling interval of less than a year is analyzed), vaccination, passenger traffic, and comorbidities, which affect the probabilities of transition to different agent states. Our goal here is to demonstrate the correlation between dependences of similar parameters on the same measurements and recommend what parameters can be determined consistently and based on what measurements.
The ODE system (1) describing COVID19 spread in the population is divided into 10 groups (Kerr et al., 2020) based on the mass balance law and is expressed as follows:  (Kerr et al., 2020) and (b) the SEIR-HCD model (Unlu et al., 2020).

Contagious infected carries (I )
Hospital admission cases (H )

Asymptomatic carries (A)
Contagious infected carries (I ) with the initial conditions: Model (4) characterizes a class of agent states for an age group within the agentbased model (see Fig. 1, a).
The equation system for the SEIRHCD model, where the population is divided into 7 groups (Krivorotko et al., 2020b;Unlu et al., 2020), is composed in a similar fashion: with the initial conditions: Here, S (t) is a susceptible agent at time t, E(t) -a nonconta gious infected (not transmitting the virus), I (t) -a contagious infected (transmitting the virus), A(t) -an asymptomatic case, Sym(t) -a symptomatic case, H (t) -a severe case, C (t) -a critical case (requiring a ventilator), M (t) -a mild case, R(t) -a recovered case, and D (t) -a mortality case. The averaged parameters of models (4) and (5) 8.9 · 10 6 α 7.6 · 10 14 D(t) E(0) 6.7 · 10 -14 α 6.07 · 10 -6 Note that the coefficients t -1 inc , t -1 inf , t -1 hosp , t -1 crit , τ -1 , τ -1 1 , τ -1 2 at the respective agent states in models (4) and (5) describe the delay in transition between states (Likhoshvai et al., 2004). Consider the following equation (5): where coefficient t -1 inc (in the linear approximation) indicates the delay of t inc days before the transition from noncontagious infected group E(t) to contagious infected group I (t), and coef ficient -t -1 inf -that the agent stays in the contagious infected group for the infection period of t inf days.
Mathematical model 1 (see the diagram in Fig. 1, a). As sume that additional information on recoveries and deaths on fixed days is available for the mathematical model (4): Here, R k is the number of recovered agents on day k , D k is the number of diseaserelated deaths on day k. The model analyzes the semirelative sensitivity of two un known parameters, i. e. contagiousness α and initial quan tity of asymptomatic cases E(0) in the model (4) to the measure ments (6). It will allow us to determine the possibility of con sistent identification of the unknown parameters based on the data available for an adequate representation of epidemio logical situation in the region. The sensitivities of parameters (q k ) = (α, E(0)), k = 1, 2, to measurements ( f i ) = (R, D), i = 1, 2, and sorted in descending order are presented in Table 2. The lower the value the less the effect of q k on f i . Figure 2 demonstrates how sensitive function ∂ f i (t) ∂ q k q* k changes in time depending on the parameter. Thus, parameters α and E(0) within the model (4) are less sensitive to vari able D (t) and are therefore not identifiable by the mortality data alone. On the other hand, these parameters are sensitive to function R (t), and, as a result, are recovered more consistently based on the recovery data. Mathematical model 2 (see the diagram in Fig. 1, b). Let us now investigate the SEIRHCD mathematical model (5). Assume that additional information on diagnoses, critical cases, and mortality on fixed days is available:

Parameter Е(0) and data R(t)
Parameter Е(0) and data D(t) where b(t) [0, 1] is the proportion of asymptomatic carriers in the diagnoses, f k -the daily number of diagnoses on day k, C k -the number of critical cases on day k. Parameters q = (α E , α I , β, ε HC , μ, E 0 ) T ℝ 6 are considered unknown. To analyze the semirelative sensitivity of the pa rameter vector q to measurements (7) within the mathemati cal model (5), we derive ∂ f i (t) ∂ q k q* k , ( f i ) = (I, C, D), i = 1, 2, 3, and analyze the values ∂ f i (t) ∂ q k q* k 2 ( Table 3). Consistency of identifying parameters β, ε HC , and μ as the result of solving the inverse problem barely depends on the available measure ments of the number of infected carriers I (t); however, it is not the case for the more sensitive coefficients α E , α I , E 0 . Figure 3 shows how sensitive function ∂ f i (t) ∂ q k q* k changes in time depending on the parameter. The more the parameter changes in time, the higher its sensitivity to the measurements analyzed, and the more consistently it is identified. In Fig. 4, the results of parameter sensitivity analysis for the model (5) at various iterations of the orthogonal algorithm are presented (see the description of the algorithm in (Krivorotko et al., 2020a)). Iterations of the orthogonal algorithm, whose total number is one less than the dimension of the unknown parameter vector (i. e. the number of columns in the sensitivity matrix), are plotted along the horizontal axis, and the norms of perpendiculars for the obtained transformations of sensitivity matrices -along the vertical axis. It was shown that the con tagion between the asymptomatic and susceptible groups α E , the contagion between the infected and susceptible groups α I linked with virus contagiousness and social factors, and the initial quantity of infected carriers and the agents in incubation period E 0 turned out to be the more identifiable parameters. The ranking of the parameters obtained via sensitivity analysis   for the model (5) from the most sensitive to the least sensitive is as follows: α E , E 0 , α I , ε HC , μ, β. As a result of identifiability analysis, a conclusion can be made that model parameters α E , E 0 , and α I are the least sensi tive to data variations (errors), i. e. are more identifiable. In other words, these parameters are identified more consistently as a result of solving the inverse problem (5), (7). In turn, pa rameters ε HC , μ, and β are the most sensitive to measurement errors, i. e., less identifiable (and have the lowest values of the norms of perpendiculars in the sensitivity matrix). Hence, the regularization algorithm should be developed to ensure the consistent identification of sensitive parameters.

Mathematical modeling of COVID-19 spread in the Novosibirsk region
To build a COVID19 spread model for the Novosibirsk re gion, the following publicly reported data were used: (a) Number of people tested (including the number of diag noses f and proportion of asymptomatic carriers b(t)), recovered cases (R), and COVID19 related deaths (D); (b) Duration of incubation period t inc , latent period τ, infec tion period t inf , hospitalization period t hosp , and duration of ventilation t crit ; (c) Recovery time for mild τ 1 and severe τ 2 cases; (d) Demographic profiles (population size and its age distri bution in the region); (e) Average household size (2.6 people) in the Russian Fede ration in 2019, according to UN data (https://population. un.org/Household/#/countries/840). Additional information was regularly obtained from the following websites: • Ministry of Health of the Novosibirsk region: https://zdrav. nso.ru/ (d). • Federal State Statistics Service of the Novosibirsk region: https://novosibstat.gks.ru/folder/31729 (c). • Stopcoronavirus website: https://стопкоронавирус.рф (a). • World Health Organization: https://www.who.int (b).
The modeling was performed taking into account the mea sures to contain the COVID19 spread (Table 4).
Solutions of inverse problems (4), (6) and (5), (7) were reduced to misfit function minimization (Kabanikhin, 2008): Here, s is the statistics used for data comparison (cumulative diagnoses, critical cases, and mortality), w s -the weight co efficient, c i, s d , c i, s m -data (with subscript d ) and model values (with subscript m), T -the modeling interval in days, q -the unknown parameter vector: q 1 = (β, E 0 ) T for the inverse problem (4), (6) and q 2 = (α E (t), α I (t), β, ε HC , μ, E 0 ) T for the inverse problem (5), (7). The absolute norm for computational experiments was set as follows: {c t, s d } was the normalization item; and the standard deviation was  Minimization of misfit function J (q) was implemented us ing the differential evolution method from the SciPy.Optimize Python library. The general algorithm of global minimum search was as follows: 1. Creation of the initial generation {q i } B, i = 1… N..

Creation of a new generation:
• Mutation: For all q i B three random vectors were selected as fol • Crossover: trial vector u was calculated as follows: 3. Selection: The results of COVID19 spread modeling in the Novosi birsk region with the forecast up to December 10, 2020, are presented in Fig. 5. The model was built using the agentbased approach relying on the investigation of interactions between individuals and their effect on global parameters. The model ing was performed using Covasim, a simulator for develop ing stochastic agentbased models. A detailed discussion of the model structure may be found in (Kerr et al., 2020). We also used the statistical data on diagnoses and deaths from March 12 to October 23, 2020. The following misfit function was minimized taking into account the identifiability analysis results for the model (4), (6): Here, f i d , f i m are cumulative diagnoses, and D i d , D i m are cumu lative deaths.
Modeling results f i m and statistics f i d of cumulative and daily diagnoses are presented in Fig. 5, a, b. Modeling results D i m and statistics D i d of cumulative COVID19 related deaths in the Novosibirsk region are presented in Fig. 5, c. Note that the second wave of the epidemic may be observed in the No vosibirsk region in midSeptember in both the statistical data and modeling results. Its growth will be insignificant (i. e. it will not exceed 215 new daily diagnoses by midDecember, 2020) due to the introduction of stricter containment measures from October 28.
The inverse problem (5), (7) was reduced to the minimiza tion of the following misfit function (Krivorotko et al., 2020b): Infection rate parameters α E (t) and α I (t) linked to virus contagiousness and varying in time were represented as piecewise constant functions depending on the interventions (see Table 4).
Based on the identifiability analysis results for the model (5), (7), more rigid restrictions were imposed for poorly identifiable parameters (see Table 1). The result of solving the inverse problem (5), (7) for the SEIRHCD model for the period from April 15, 2020, to October 3, 2020, is presented in Fig. 6. Note that although the rough mathematical model (with ODE system of 7 equations) captures the general trend based on the number of diagnoses (the peak of confirmed cases in the region, see Fig. 6, а), it is still unable to qualitatively describe highly variable statistics (critical cases requiring a ventilator, see Fig. 6, b). Nonsmooth solutions in Fig. 6 result from the use of the Yandex selfisolation index characterized by weekly seasonality. In this case, smoothing would undermine the use of the tool. A more detailed analysis of modeling and fore casting of COVID19 spread in the Moscow and Novosibirsk regions is presented in (Krivorotko et al., 2020b). This case requires the agentbased approach capable of detailed descrip tion of small statistical datasets.

Conclusions
In the present study, sensitivitybased identifiability analysis has been performed for the COVID19 pandemic spread models based on systems of differential equations. The algo rithm is based on the analysis of the sensitivity matrix using the differential and linear algebra apparatus, which shows the degree of dependence of the unknown model parameters on the given measurements.
The analysis has shown that the virus contagiousness is consistently identified based on the number of new daily diag noses, critical cases, and recoveries. On the other hand, the predicted proportion of admitted critical state patients requir ing a ventilator and the mortality rate are identified much less consistently. It has been demonstrated that developing a more realistic forecast will require additional information about the process such as the number of daily hospital admissions.
The identifiable parameters refinement problems have been reduced to the minimization of the respective misfit func tions describing the proximity of the modeling data to the statistics of the diagnoses, critical cases, and mortality in the Novosibirsk region. The use of absolute and quadratic norms as measures of deviation between the data and the modeling results in the minimization procedures has not yielded any significant differences in terms of analyzing the modeling results. It has been shown that a rough compartmental model of seven ODEs describes the general trend of the coronavirus infection propagation, as it is sensitive to peaks of confirmed cases; however, it is unable to qualitatively describe smaller statistics (daily numbers of critical cases t k and deaths), which may lead to improper conclusions. A more detailed mathemati cal model using the agentbased approach, where a class of agent states is expressed by a system of ten ODEs, will make it possible to describe noisy statistical datasets in more detail and build adequate scenarios of COVID19 pandemic spread.