A Journey Through the Molecular World: Representing Molecules in Python

Michelle Yi (Yulle)
5 min readApr 26, 2023
Photo by Terry Vlisidis on Unsplash

Introduction

Ever wondered how you could represent molecules in Python?

In this post, we’ll take a journey through the molecular world and explore five oxygen-containing molecules: water, carbon dioxide, ozone, hydrogen peroxide, and glucose.

We’ll learn how to represent them using Python data structures and also create a molecular graph using the networkx package.

Meet the Molecules

Here are the five Oxygen-containing molecules we will look at, their structures, and their overall role in the ecosystem:

  1. Water (H2O): Water is a simple molecule composed of two hydrogen atoms and one oxygen atom. It is essential for life on Earth and is involved in many biological and chemical processes.
  2. Carbon dioxide (CO2): Carbon dioxide consists of one carbon atom and two oxygen atoms. It is produced as a byproduct of cellular respiration in living organisms and is also a greenhouse gas that plays a significant role in Earth’s climate.
  3. Ozone (O3): Ozone is a molecule made up of three oxygen atoms. It is found in Earth’s atmosphere, primarily in the stratosphere, where it forms the ozone layer that protects life on Earth from harmful ultraviolet radiation.
  4. Hydrogen peroxide (H2O2): Hydrogen peroxide is a chemical compound with two hydrogen atoms and two oxygen atoms. It is a powerful oxidizer and is used as a disinfectant, bleaching agent, and antiseptic.
  5. Glucose (C6H12O6): Glucose is a simple sugar and an essential energy source for living organisms. It is a carbohydrate that consists of six carbon atoms, twelve hydrogen atoms, and six oxygen atoms. During cellular respiration, glucose is broken down to produce energy in the form of ATP and releases carbon dioxide and water as byproducts.

Now, let’s compare both a traditional Python and graph approach to representing these molecules.

Representing Molecules in Python

In this section, we will explore how to represent the five molecules and their bonds using Python data structures like dictionaries and lists. Our goal is to create a simple and intuitive representation of these molecules, making it easy to understand their composition and structure.

Using Dictionaries and Lists

Python dictionaries and lists provide a convenient way to represent molecules and their bonds. We can create a dictionary where the keys represent the molecule names, and the values are lists of dictionaries describing the bonds between atoms within the molecule.

Here’s a code snippet illustrating how to represent the five molecules using dictionaries and lists:

Representing oxygen-related molecules in Python using dictionaries and lists

The dictionary called molecules contains the names of the molecules as keys and lists of dictionaries describing their bonds as values. The dictionaries representing bonds contain keys for the bonded atoms and the number of bonds.

Displaying Molecules and Their Bonds

Now that we have represented the molecules and their bonds in a Python dictionary, we can display this information in a user-friendly format. The following code snippet prints the molecules and their bonds:

The output of this code will be:

Sample output of Python molecules example

This representation allows us to easily visualize the molecules and their bonds, but it lacks the graphical representation that the networkx package provides, which we will explore in the next section.

Visualizing Molecules as a Graph with NetworkX

In this section, we’ll learn how to represent the five molecules as a graph using the networkx package. This powerful Python library allows us to create, manipulate, and study the structure, dynamics, and functions of complex networks. We'll start by installing the package and then create a molecular graph to visualize the relationships between atoms in our molecules.

Installing the NetworkX Package

To install the networkx package, simply run the following command:

pip install networkx

Creating a Molecular Graph

With the networkx package installed, we can now create a graph to represent our molecules. A graph consists of nodes (atoms, in our case) and edges (bonds between atoms). We'll also include edge attributes to store the molecule name for each bond.

Here’s a code snippet that demonstrates how to create a molecular graph using networkx:

Python networkx graph example of representing molecules

This script creates a networkx graph with nodes for the different atoms (H, O, C, and N) and adds edges representing the bonds between these atoms in each molecule. The graph is then visualized using matplotlib, with the molecule name labeled on each edge representing a bond in the respective molecule.

Sample visualization of graph output using networkx. This is over-simplified and only meant to be illustrative.

Please note that this representation is highly simplified and doesn’t take into account the detailed molecular structure or the specific locations of atoms and bonds within the molecules. We also intentionally duplicate molecule representations (not necessary in the graph) for the purpose of visualizing it here.

For an example of how to represent the graph with molecule logic consolidated in a single place but not optimized for visualization, see below:

Sample snippet for alternative networkx graph approach with consolidated logic

Interpreting the Molecular Graph

The molecular graph allows us to visualize the relationships between the atoms in our molecules. In the graph, each node represents an atom, and each edge represents a bond between two atoms. The edge labels indicate which molecule the bond belongs to.

By visualizing the molecules as a graph, we gain a better understanding of the relationships between atoms and the overall structure of each molecule. However, it’s important to remember that this representation is a simplification and may not accurately depict the 3D structure of the molecules.

Conclusion

In this Medium post, we explored different approaches to represent and visualize oxygen-containing molecules using Python.

We started by discussing five molecules that use oxygen: water, carbon dioxide, ozone, hydrogen peroxide, and glucose. Then, we demonstrated how to represent these molecules using both traditional methods in Python and the networkx library. Although this approach provided a simple visualization of the atomic relationships, it may not be the most accurate way to depict molecular structures.

When representing and visualizing molecules in Python, it’s essential to choose the appropriate tools based on the specific requirements of your project. For simple graph-based representations, networkx can provide a quick and straightforward solution. However, for more accurate and detailed molecular visualizations, specialized libraries like RDKit are recommended.

By understanding the various tools and techniques available for molecular visualization in Python, you’ll be better equipped to tackle projects involving chemistry, biology, and other fields where molecular structures play a critical role.

Happy coding!

References

NetworkX:

RDKit:

Cheminformatics and Computational Chemistry:

--

--

Michelle Yi (Yulle)

Technology leader that specializes in AI and machine learning. She is passionate about diversity in STEAM & innovating for a better future.