Statistical summary of protein structures
Is Version Of
Every biological system has proteins, and almost all biological activities require the participation or support of a specific set of proteins. Therefore, understanding the functions of the proteins is essential to research in all biological and medical fields. To fully understand their functions, however, it is critical to know their structures and related dynamic behaviours.
There is no unique way of modelling protein structure and dynamics. Experimental techniques have been employed to collect some indirect structural data from which the structures can be deduced. These techniques are costly and time consuming and limited to certain types or sizes of proteins. Yet, they are presently the major sources available for structure determination. Theoretical approaches have been developed for modeling protein structure and dynamics, including potential energy minimization, molecular dynamics simulation, and comparative modeling. In practice, these methods are often combined. Yet, all these methods have limitations, and their modelling capabilities, even when they are combined, are not yet sufficient to meet the high quality and high quantity modelling demands from applications. New approaches and breakthroughs are actively sought.
In this work, we investigate a novel statistical approach to protein modelling. Instead of relying on physical experiments, we analyse a whole spectrum of residue-level protein structural properties statistically, for better understanding their physical and structural properties revealed in the known structural data. The data-driven and knowledge-based exploration and analysis of structural properties could take advantage of the knowledge extracted from the rich available data, and also the power of statistical methods. We first develop the statistical measures on protein residue-level structural properties. We further introduce a statistical framework for protein structural assessment, and the formulation of a novel set of residue-level statistical potentials for protein modelling and dynamics. Secondly, to allow researchers to access and manipulate a large set of statistical data on protein residue-level structural properties and evaluation of statistical potentials, an open source package is developed and released in R, with a user-friendly GUI, accessible and executable by a public user in any R environment. Lastly, we integrate web pages and server-side programs in a one-step query workbench, making it easy for a user to submit queries and acquire results. The implementation is carried out in PHP - a popular and widely supported scripting language.