Author Morvan, Marie ♦ Devijver, Emilie ♦ Giacofci, Madison ♦ Monbet, Valérie
Source Hyper Articles en Ligne (HAL)
Content type Text
File Format PDF
Language English
Subject Keyword math ♦ Mathematics [math]/Statistics [math.ST]
Abstract In many medical problems, it is common to face heterogeneous data with unknown patients profiles leading to difficulties to build a good diagnosis model. In this paper, our aim is to build a suitable and interpretable diagnosis tool to predict the Non-Alcoholic Steatohepatitis (NASH), taking into account the structure and the dimension of the spectrometric data. Thus, we introduce a penalized mixture of logistic regression model that allows the prediction of a binary response. Parameters estimation is done using the EM algorithm. In the presence of a high number of covariates, estimation of the full covariance matrix and interpretation of the regression coefficients is not trivial. To highlight relevant covariates for the prediction and their links, we apply a penalization to the co-variance matrix and the regression coefficients. The estimated model depends on regularization parameters that allow to adjust the strength of the penalization. Automatic selection tools are used to choose the best model, namely with respect to the AIC criterion. A simulation study is performed to evaluate the proposed method, and the application on the NASH data set is presented. This model leads to better prediction performance than the competitive methods and provides useful tools to better understand the data.
Educational Use Research
Learning Resource Type Article