Modelling and Calibration of Data Intensive Systems

Lead CI: Philip Pollett

A characteristic feature of data intensive systems such as the brain, social networks, communications networks, and climate and ecosystems, is the massively large numbers of interacting agents with various types of contact structures. A new set of mathematical tools is needed to gain sufficient insight into time evolution of these systems for monitoring and control. Of major importance to Australia is the need to obtain more accurate assessments of the spread of invasive diseases and pests: for health service provision and for improving management strategies of native and commercial animal populations. For example, approximately $17 million was allocated in 2012-13 to continue the program of eradication of the Red Imported Fire Ant, one of several similar programs designed to protect Australia's biodiversity and to limit the spread of diseases and pests. However, these programs can only be effective if the dynamics of the populations in question are well understood.

Whilst mathematical models are used widely to gain insight into population dynamics, they have not been able to properly account for local population dynamics, individual variation, spatial structure and differing migration patterns. Nor have these models been able to account simultaneously for behaviour emerging at differing scales in time and in space, and capturing these features represents a major challenge. Models for the spread of infection (ideas, rumours, invasive pests, et cetera) on these networks are emerging, and, stimulated by the HIV pandemic, much effort has been devoted to modelling epidemic spread in heterogeneous environments and at different time scales. However, the present set of models cannot deal with host-parasite infections and demographically structured ecological networks; it is the nature and type of the data (rather than merely its massive volume) that requires models for which there is no limit on the number types of individual. Furthermore, the standard laws of ``mass action'' are not useful in this new context, because understanding limiting deterministic behaviour requires the solution to an abstract Cauchy problem, rather than a (finite) system of differential equations.

With this comes the challenge of calibrating these new models to data. Suitable inferential methods that necessarily avoid evaluating the likelihood are Approximate Bayesian Computation and Indirect Inference (which, in the present context, would use a space-scaled diffusion approximation as a reference model). However, these methods rely heavily on the ability to simulate the model efficiently, and simulating large state-space models can be computationally demanding. Recently developed simulation methods show particular promise, especially the tau-leaping algorithm, which exploits time multi-scaling; it was initially proposed as a non-exact but computationally efficient means of simulating chemical reactions, but appears to be particularly suited to population process models. Another approach is to replace “fast” variables with the trajectories of their limiting deterministic counterparts (while this seems like an obvious multi-scaling approach, I have not seen it in the literature).