AI / ML in Chemistry & Biochemistry

Instructor:

Dan Gezelter
phone: 631-7595
office: 251 Nieuwland Science Hall
e-mail: gezelter@nd.edu

Lecture:

Wednesdays 12:50-1:40
332 DeBartolo Hall

Grade:

Quality of Discussion you lead: 50%
Contribution to other discussions: 50%

Overview:

This seminar will explore some topics of how artificial intelligence / machine learning are changing and impacting modern chemistry. It is your opportunity to develop your skills at leading a scientific discussion with your peers. What makes for a good scientific conversation? How do you engage your audience? How do you encourage participation?

Below are 13 sample topics that we can discuss (and we can open it up to other topics if you find something interesting). Each of you will be responsible for leading a short (30-40 minute) discussion with the rest of the class on one of these topics. The order of assigned discussions has been chosen at random. Today, you may request a particular topic if it aligns with your research or professional interests. The day before your assigned presentation, you’ll be expected to email a brief (1 page) abstract to the rest of the class along with links to any articles or materials you’d like us to read in advance.

In each of these discussions, we’ll discuss both the AI/ML topic, and how that topic bears on the work of lab chemists and biochemists, and what this implies for the future of our field.

Attendance at all class sessions is required unless you have an officially excused absence. Your grades for this course will be based on the quality of the discussion you lead (50%) as well as your contributions to the discussions on other topics (50%).

The course Canvas page will include reading assignments from your weekly discussion leaders, which is a growing set of links to interesting essays, papers, videos, and online resources to get you started. These links should not be the only reading you do in preparing to lead a discussion; they are just there to get you started on your research.

To get us started, let's look at two interesting discussions:

Topics:

  1. AlphaFold & AI-Driven Protein Design (Nobel-winning breakthrough)
    DeepMind’s AlphaFold revolutionized protein structure prediction, folding millions of known proteins with high accuracy and solving a decades-old problem. Complementing this, David Baker’s lab used AI-based design (Rosetta) to create entirely novel proteins with applications in vaccines, sensors, and nanomaterials.
  2. Open Molecules 2025: Massive Quantum Chemistry Dataset
    Meta and some of the U.S. national labs released Open Molecules 2025, which is a dataset of over 100 million quantum mechanical density‑functional theory calculations, enabling ML models with quantum‑level accuracy to train on vast amount of molecular data.
  3. ESM3-Designed Artificial Protein: esmGFP
    Using the protein language model ESM3, researchers simulated years of molecular evolution to design esmGFP, a novel fluorescent protein not found in nature.
  4. Spectroscopy with Machine Learning: “SpectraML”
    Integrating machine learning into spectroscopy has yielded a program called SpectraML, used for predicting molecular spectra from structures, and inversely inferring structures from spectral data. Does it work? How well does it work?
  5. AI-powered Catalyst Discovery
    
How are Large Language Models (LLMs), accelerating both homogeneous and heterogeneous catalyst discovery? What are the applications in sustainable energy, materials, and bioactive compounds? Does AI ‘understand’ what a catalyst does?
  6. ChemCrow: LLMs Augmented with Chemistry Tools

    The ChemCrow agent integrates an LLM with 18 expert chemical tools, autonomously planning syntheses. Does it work? How well does it work? Why does it work?
  7. AI for Reaction Prediction & Retrosynthesis (e.g., IBM RXN)
    Platforms like IBM RXN for Chemistry use deep learning to predict reaction outcomes and design retrosynthetic pathways, a potentially useful tool for synthetic and medicinal chemists.
  8. AI-Driven Materials Discovery (GNoME)

    Google DeepMind’s GNoME system discovered over 2 million new stable inorganic crystal structures, validated via autonomous robotic synthesis.
  9. Machine Learning Interatomic Potentials (MLIPs)

    MLIPs replace traditional empirical force fields or expensive ab initio calculations (or DFT) with neural networks, Gaussian processes, or kernel methods trained on high-fidelity quantum data. How does this impact molecular simulations?
  10. AI for teaching chemistry
    This topic could include AI-powered virtual labs, LLMs for question-answering partners, or any of the myriad ways that AI is changing the face of chemical education (including by increasing the rate of cheating on papers and exams). What are the advantages and disadvantages of AI in chemical education?
  11. Crystal Structure Prediction with ML
    AI-enhanced CSP methods can predict possible inorganic crystal structures from composition alone, rank them by stability without exhaustive DFT. ALIGNN (Atomistic Line Graph Neural Networks), CGCNN (Crystal Graph Convolutional Neural Network) are some typical tools.
  12. AI-driven phase diagram prediction
    Graph neural networks predict phase stability across temperature/pressure/composition spaces, which can replaces hundreds of expensive thermodynamic calculations.
  13. AI-driven kinetic modeling
    Automated mechanism generation (e.g., for combustion or catalysis) via reaction network prediction and rate constant estimation. One example is the RMG (Reaction Mechanism Generator) with ML rate predictors.