You Are Now >> Home >> Resaerch >> Artificial Intelligence >> Improving parameter learning of Bayesian nets from incomplete dat


Improving parameter learning of Bayesian nets from incomplete dat



 

Abstract

This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) loglikelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters

Introduction

This paper focuses on learning the parameters of a Bayesian network (BN) with known structure from incomplete samples, under the assumption of MAR (missing-at-random) missing data. In this setting, the missing data make the log-likelihood (LL) function non-concave and multimodal. The most common approach to maximize LL in the presence of missing data is the ExpectationMaximization (EM) algorithm [4], which generally converges to a local maximum of the LL function. The EM can be easily modified to maximize, rather than LL, the posterior probability of the data (MAP), as well as other penalized maximum likelihood ideas [11, Sec 1.6]. Generally, maximizing MAP rather than LL yields smoother estimates, less prone to overfitting [8]. In the following, we refer to the function to be maximized as the score. Although we focus on BN learning, the ideas are general and shall apply also to other probabilistic graphical models that share similar characteristics in terms of paramater learning.

 


Sponsored link:

Read Full Research
Research Person : G. Corani and C. P. De Campos
Contact Person : IDSIA
Galleria 2, CH-6928 Manno (Lugano)
Switzerland
giorgio,cassio@idsia.ch
Year : 2011

Category: Artificial Intelligence
Related Research:
Related Product:
Related News:
Related Dictionay: