A software framework for data dimensionality reduction: application to chemical crystallography

Samudrala, Sai
Balachandran, Prasanna
Zola, Jaroslaw
Rajan, Krishna
Ganapathysubramanian, Baskar
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue

Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.


This article is published as Samudrala, Sai Kiranmayee, Prasanna Venkataraman Balachandran, Jaroslaw Zola, Krishna Rajan, and Baskar Ganapathysubramanian. "A software framework for data dimensionality reduction: application to chemical crystallography." Integrating Materials and Manufacturing Innovation 3, no. 1 (2014): 1-20. DOI: 10.1186/s40192-014-0017-5. Posted with permission.

Non‐linear dimensionality reduction, Process‐structure‐property, Apatites, Materials science, High‐throughput analysis