Vibrant collision of red and blue ink drops

Daniel J. Kowalski

Digital Chemistry | Data Science | Education

About Me

I am a PhD-qualified chemist with close to a decade of research and teaching experience and a strong interdisciplinary background. My synthetic work has ranged across the breadth of chemistry, from an internship in the pharmaceutical industry to work on academic projects in classical and organometallic coordination chemistry, nanochemistry, and abiogenesis. As a doctoral candidate, I focused on designing and validating digital tools to enhance chemical discovery, drawing from data science, machine learning, and robotics.

Alongside this, I have extensive teaching experience across all ages from elementary/primary school to undergraduate. Notably, I have spent two and a half years as a teacher of English as a foreign language across multiple schools in Japan.

Research Interests

My research interests cover the intersection of chemistry with data science, machine learning, and robotics. I seek to enable discovery by developing new methods, models, and ideas that expand how we search for and understand chemical systems.

  • Digital Discovery Strategies
    Developing machine learning methods for the exploration and exploitation of chemical search spaces, with a focus on fostering curiosity and serendipity.
  • Earth-Abundant d-Block Chemistry
    Exploring the structures, reactivity, and properties of systems containing first row transition metals.
  • Self-Driving Laboratories
    Assembling systems that integrate robotics and machine learning for closed-loop material discovery, and investigating the interactions between human and AI agents.
  • Deconvolution of Chemical Mixtures
    Applying machine learning to interpret analytical data and visualise meaningful relationships within mixtures with high compositional diversity.
  • Digitisation of Chemical Synthesis
    Creating interpretable and generalisable representations of synthetic data for machine learning.

Data Portfolio

This portfolio is intended to demonstrate skills across a variety of data science and machine learning techniques using Python. Each project is a toy problem that uses methods that are actively being applied to chemical research. However, as I only have access to a limited amount of computing power, the projects are meant to be simple and light-hearted. The focus is on demonstrating proficiency in a given skill, rather than creating ground-breaking original research.

Imaage showing the difference between true and predicted values for the test dataset

1 | Optimising a Neural Network for Solubility Prediction

This project uses a dataset of calculated log(P) values for a variety of organic molecules to train a long short-term memory recurrent neural network (LSTM RNN). SMILES strings are tokenised, trimmed, and padded, before being used to train the model. Bayesian optimisation is used to optimise the hyperparameters in order to achieve the best balance between the mean square error of the training and test datasets. The model achieves reasonable accuracy, but tends toward larger errors when less common structural features are present (e.g. for hypervalent iodine compounds or some atypical phosphorous-containing species, such as deprotonated alkylphosphanes).

Four AI-generated images of birds

2 | Creating Bird Images using Generative AI

This project uses a Generative Adversarial Network (GAN) to generate images of birds. A dataset of over 9,000 bird images is first created by standardising the size and resolution and converting to greyscale. This is then used to train two networks - an image generator and a real/fake image discriminator - in tandem. The generator takes noise as an input and attempts to trick the discriminator as the latter tries to determine whether images are from the original dataset (real) or generated (fake). While the generator is unable to create images that are truly lifelike, it is able to produce images that are clearly attempting to mimic birds with around 300-400 training epochs. This could potentially be further optimised with additional computing power, but is a solid proof of concept for my purposes.

Technical Publications

Automated Library Generation and Serendipity Quantification Enables Diverse Discovery in Coordination Chemistry
Kowalski, MacGregor, Long, Bell, Cronin
J. Am. Chem. Soc. 2023, 145(4), pp.2332

An Autonomous Chemical Robot Discovers the Rules of Inorganic Coordination Chemistry without Prior Knowledge
Porwol, Kowalski, Henson, Long, Bell, Cronin
Angew. Chem. Int. Ed. 2020, 59(28), pp.11256