Decision tree machine learning applied to bovine tuberculosis risk factors to aid disease control decision making

Romero, M P and Chang, Y M and Brunton, L A and Parry, J and Prosser, A and Upton, P and Rees, E and Tearne, O and Arnold, M and Stevens, K and Drewe, J A (2019) Decision tree machine learning applied to bovine tuberculosis risk factors to aid disease control decision making. PREVENTIVE VETERINARY MEDICINE.

12478_Decision-tree-machine-learning-applied-to-bovine-tuberculosis-risk-factors-to-aid-disease-control-decision-making_Accepted.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview


Identifying and understanding the risk factors for endemic bovine tuberculosis (TB) in cattle herds is critical for the control of this disease. Exploratory machine learning techniques can uncover complex non-linear relationships and interactions within disease causation webs, and enhance our knowledge of TB risk factors and how they are interrelated. Classification tree analysis was used to reveal associations between predictors of TB in England and each of the three surveillance risk areas (High Risk, Edge, and Low Risk) in 2016, identifying the highest risk herds. The main classifying predictor for farms in England overall related to the TB prevalence in the 100 nearest cattle herds. In the High Risk and Edge areas it was the number of slaughterhouse destinations and in the Low Risk area it was the number of cattle tested in surveillance tests. How long ago the last confirmed incident was resolved was the most frequent classifier in trees; if within two years, leading to the highest risk group of herds in the High Risk and Low Risk areas. At least two different slaughterhouse destinations led to the highest risk group of herds in England, whereas in the Edge area it was a combination of no contiguous low-risk neighbours (i.e. in a 1 km radius) and a minimum proportion of 6–23 month-old cattle in November. A threshold value of prevalence in 100 nearest neighbours increased the risk in all areas, although the value was specific to each area. Having low-risk contiguous neighbours reduced the risk in the Edge and High Risk areas, whereas high-risk ones increased the risk in England overall and in the Edge area specifically. The best classification tree models informed multivariable binomial logistic regression models in each area, adding statistical inference outputs. These two approaches showed similar predictive performance although there were some disparities regarding what constituted high-risk predictors. Decision tree machine learning approaches can identify risk factors from webs of causation: information which may then be used to inform decision making for disease control purposes.

Actions (Repository Editors)

View Item View Item