De Brabanter,
Kris
Email Address
Birth Date
Title
Academic or Administrative Unit
The Department of Statistics seeks to teach students in the theory and methodology of statistics and statistical analysis, preparing its students for entry-level work in business, industry, commerce, government, or academia.
History
The Department of Statistics was formed in 1948, emerging from the functions performed at the Statistics Laboratory. Originally included in the College of Sciences and Humanities, in 1971 it became co-directed with the College of Agriculture.
Dates of Existence
1948-present
Related Units
- College of Liberal Arts and Sciences (parent college)
- College of Agriculture and Life Sciences (parent college)
- Statistical Laboratory (predecessor)
About
Profile Link
ORCID iD
Publications
Using the likelihood ratio in bloodstain pattern analysis
A data set of bloodstain patterns for teaching and research in bloodstain pattern analysis: Gunshot backspatters
This is a data set of blood spatter patterns scanned at high resolution, generated in controlled experiments. The spatter patterns were generated with a rifle or a handgun, and different ammunitions. The resulting atomized blood droplets travelled opposite to the bullet direction, generating a gunshot backspatter on a poster board target sheet. Fresh blood with anticoagulants was used; its hematocrit and temperature were measured. Main parameters of the study were the bullet shape, size and speed, and the distance between the blood source and target sheet. Several other parameters were explored in a less systematic way. This new and original data set is suitable for training or research purposes in the forensic discipline of bloodstain pattern analysis.
New bandwidth selection criterion for Kernel PCA: Approach to dimensionality reduction and classification problems
Background: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Results: Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. Conclusion: We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
Using the likelihood ratio in bloodstain pattern analysis
A data set of bloodstain patterns for teaching and research in bloodstain pattern analysis: Impact beating spatters
This is a data set of 61 blood spatter patterns scanned at high resolution, generated by controlled impact events corresponding to forensic beating situations. The spatter patterns were realized with two test rigs, to vary the geometry and speed of the impact of a solid object on a blood source – a pool of blood. The resulting atomized blood droplets travelled a set distance towards a poster board sheet, creating a blood spatter. Fresh swine blood was used; its hematocrit and temperature were measured. Main parameters of the study were the impact velocity and the distance between blood source and target sheet, and several other parameters were explored in a less systematic way. This new and original data set is suitable for training or research purposes in the forensic discipline of bloodstain pattern analysis.
Predicting breast cancer using an expression values weighted clinical classifier
Background: Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. Results: We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies. Conclusions: Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.
Author’s Response to Commentary on: Liu Y, Attinger D, De Brabanter K. Automatic classification of bloodstain patterns caused by gunshot and blunt impact at various distances
Methodologies for Studying Human-Microclimate Interactions for Resilient, Smart City Decision-Making
Creating sustainable, resilient cities requires integrating an understanding of human behavior and decision-making about the built environment within an expanding range of spatial, political, and cultural contexts. Resilience—the ability to survive from and adapt to extreme or sudden stresses—emphasizes the importance of participation by a broad range of stakeholders in making decisions for the future. Smart cities leverage technology and data collected from the community and its stakeholders to inform and support these decisions. Energy usage in cities starts with people interacting with their environments, such as occupants interacting with the buildings in which they live and work. To support city stakeholders as they develop policies and incentives for improved resilient energy utilization, researchers also need to consider microclimates and social dynamics in addition to building-occupant interactions. Sustainable design of the urban built environment therefore needs to expand beyond buildings to include near-building conditions. This requires investigating multiple scales and types of data to create new methodologies for design and decision-making processes. This paper presents a conceptual framework and interdisciplinary research methodology that integrates models and data-driven science with community engagement practices to create partnerships between university researchers, city officials, and residents. Our research team from design, natural sciences, data science, engineering, and the humanities presents a first example of a transformative method of data collection, analysis, design, and decision-making that moves away from hierarchical relationships and utilizes the expertise of all stakeholders.
Nonparametric Regression via StatLSSVM
We present a new MATLAB toolbox under Windows and Linux for nonparametric regression estimation based on the statistical library for least squares support vector machines (StatLSSVM). The StatLSSVM toolbox is written so that only a few lines of code are necessary in order to perform standard nonparametric regression, regression with correlated errors and robust regression. In addition, construction of additive models and pointwise or uniform confidence intervals are also supported. A number of tuning criteria such as classical cross-validation, robust cross-validation and cross-validation for correlated errors are available. Also, minimization of the previous criteria is available without any user interaction.