Research

Our group places at the intersection of computer science and biology. We develop computational methods rooted in machine learning, probabilistic modeling, and algorithmic design to address fundamental biological questions.

In recent years, we have also conducted analyses of clinical data through collaborative research with medical and dental departments.

Bioinformatics and Chemoinformatics

Protein Structure Prediction

Predicting the three-dimensional structure of a protein from its amino acid sequence was a central challenge in computational biology. Our group addressed two key stages of template-based (homology) modeling: alignment generation and model quality assessment.

For alignment generation, we developed a machine-learning-based method that replaces fixed substitution matrices with a k-nearest neighbor model trained on structurally derived alignments, combined with dynamic programming. This approach improves alignment accuracy for remote homologs with low sequence identity, where conventional methods struggle. [Bioinformatics 2020]

For model quality assessment (MQA) — the step of selecting the best structural model from a pool of candidates — we proposed a method that uses molecular dynamics (MD) simulation to evaluate structural stability, extracting features such as RMSD, secondary structure persistence, and native contact retention. Remarkably, this approach achieves accuracy comparable to state-of-the-art deep learning methods without requiring any training data. [ACS Omega 2022] We also developed a dedicated benchmark dataset (HMDM) to enable more rigorous and practical evaluation of MQA methods. [Bioengineering 2022]

Retrosynthesis Prediction

Retrosynthesis — the process of working backwards from a target molecule to identify feasible synthetic routes — is a fundamental task in drug discovery and organic chemistry. We develop computational methods to automate and improve this process.

Our core approach is a template-free single-step retrosynthesis method that uses molecular substructure fingerprints to identify potential disconnection sites in a target molecule. For each proposed disconnection, reactant candidates are retrieved and ranked by structural similarity. This approach achieves 47.2% top-1 accuracy on the USPTO dataset, improving to 61.4% when the predicted reaction class is used to narrow the search. [JCIM 2021]

We have also worked on the data infrastructure underlying retrosynthesis research, consolidating fragmented open-source computer-assisted synthesis (CASD) data into a unified, accessible database to support reproducible benchmarking across the field. [J. Cheminformatics 2026]

Medical Informatics and Dental Informatics

Halitosis prediction

In collaboration with the Dental Department, we are developing a deep learning approach that predicts halitosis from tongue photographs, offering a low-cost alternative to traditional gas chromatography. We also apply explainable AI (XAI) techniques to identify which features of the tongue image — such as the color, thickness, and extent of tongue coating — drive the model’s prediction, providing clinically interpretable feedback.