Riassunto analitico
Scores, also known as indexes, play an important role in disciplines such as medicine and economics for quantifying complex phenomena through the combination of elementary indicators, or features. Existing methods for score construction mostly rely on predefined aggregation techniques, often requiring a trial-and-error approach to achieve desirable properties. This thesis builds upon previous works, which introduced a novel data-driven approach to score generation. Such an approach relies on Multi-Objective Symbolic Regression (MOSR) to output mathematical formulas, which can also assume non-linear forms. However, this novel approach faces limitations, when dealing with formula coefficients, related to uncertainty awareness and the integration of prior and new knowledge, that are essential in advanced application contexts, such as federated and continual learning.
To address these issues, we propose a framework that combines Symbolic Regression with Bayesian Inference. We start from the models generated by the MOSR evolutionary algorithm and identify constants within them. We then parametrize the models by replacing these constants symbolically with a vector of parameters. The parametrized function is used to determine the expected value of a Bayesian Regression model, assuming Gaussian noise. Different Markov Chain Monte Carlo (MCMC) sampling algorithms and chain lengths are tested to approximate the posterior distribution.
The results demonstrate the effectiveness of the proposed module, highlighting its ability to accurately capture uncertainty on unseen data. Furthermore, we explore how the framework can adapt to evolving information, including new data and knowledge within an incremental learning context.
|