SUNCAT Theory Seminar Series
Persistent homology advances interpretable machine learning for scientific applications
Machine learning for scientific applications, ranging from physics and materials science to biology, has emerged as a promising alternative to more time-consuming experiments and simulations. The challenge with this approach is the selection of features that enable universal and interpretable system representations across multiple prediction tasks. We develop and expand on techniques in computational topology (e.g. persistent homology) to create end-to-end ML models that automatically generate descriptors and create a complex representation of structure in scientific systems; for example, nanoporous materials and protein structures. We show that these representations can also be combined with other methods, such as graph representation learning, to capture further information. We demonstrate the efficacy and power of our approaches on multiple scientific datasets by predicting a variety of different sustainable energy targets including gas adsorption and protein functions. Our results show considerable improvement in both accuracy and transferability across targets compared to baseline models constructed from commonly used manually curated features. A key advantage of our approach is interpretability. For example, in material structures, our approach has allowed us to identify pore configurations in the structure that correlate best to desirable targets (e.g. carbon capture), contributing to understanding atomic level structure-property relationships for materials design.