Structure learning in Bayesian networks and session analysis of people search within a professional social network

Thumbnail Image
He, Ru
Major Professor
Jin Tian
Huaiqing Wu
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Computer Science

Statistical learning refers to a set of methodologies for modeling and understanding data. In this thesis we will present our work for two research problems in statistical learning.

The first research problem is learning Bayesian network structures from data. We will present three different approaches for this learning problem. The first approach is to propose a dynamic programming algorithm that can compute the exact posterior probability of any modular feature of a Bayesian network with any general structure prior. The second approach is to develop a dynamic programming algorithm for obtaining the k best Bayesian network structures and then use these k best network structures to compute the posterior probabilities of hypotheses of interest based on Bayesian model averaging. The third approach is to develop new algorithms to efficiently sample Bayesian network structures according to the exact structure posterior and then use these sampled structures to construct estimators for the posterior of any feature. These three approaches all use dynamic programming techniques to learn Bayesian network structures from data.

The second research problem is session identification and analysis for the domain of people search within a professional social network. We will present our work for this research problem based on the data from LinkedIn social network. Two important refinements are proposed to address the drawbacks of the content-based method, one of two main session identification methods commonly used in real applications. We describe the underlying rationale of our refinements and then empirically show that the content-based method equipped with our refinements is able to achieve an excellent identification performance in the domain. Finally, based on our refined content-based session identification method, the corresponding session analysis is performed and the profession-oriented nature of the domain is illustrated.

Wed Jan 01 00:00:00 UTC 2014