Skip to content

mathworks/Chemistry-Molecular-Similarity-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Molecular Similarity Analysis

alt text

Molecular similarity analysis is a powerful tool in cheminformatics and drug discovery, revealing that molecules with similar properties—such as high LogP and high LogS—can still be structurally diverse. This example guides you through importing molecular data, computing fingerprints, and visualizing similarity landscapes using MATLAB® and RDKit.

By analyzing a dataset of molecules selected for both high LogP and LogS, you will discover that property-based filtering alone does not ensure structural similarity. Most molecules in the dataset are structurally dissimilar, highlighting the importance of combining both property- and structure-based approaches when exploring chemical space, designing libraries, or selecting leads.

Embrace the diversity within chemical datasets and unlock deeper insights by integrating multiple analysis strategies!

This example uses data generated in the Chemistry-Import-Visualize-and-Partition-Molecular-Datasets repository. Be sure to check out that example as well for a comprehensive introduction to molecular dataset preparation.

Setup

To Run this example use this MATLAB Live Script:

Molecular_Similarity_Analysis.mlx

MathWorks Products (https://www.mathworks.com)

MATLAB

3rd Party Products:

Python®

RDKit Open-Source Cheminformatics Software

Installation

Set up your Python environment by following the instructions provided in the guide found at Python webpage. Make sure to give the python address and to check versions of Python compatible with MATLAB products by release. This allows to build proper MATLAB Interface to Python.

This example uses some of the functions from RDKit. It can be installed easily by following its installation instructions on Linux, Windows, and macOS. You can install RDKit using pip install rdkit.

Getting Started

Follow the steps provided in this MATLAB Live Script:

Molecular_Similarity_Analysis.mlx

Exercises

Try the exercises provided at the end of this example:

Similarity vs. Properties: Pick two bins: one with low similarity scores and one with higher similarity scores. For each bin, note the range of LogP and LogS values for the molecules visualized. Are the LogP and LogS values more similar within the high-similarity bin, or is there still diversity in these properties?

Diverse Structures Insight: Why do you think molecules with similar LogP and LogS values can be structurally very different? Reflect on the types of molecular features that contribute to LogP and LogS, and discuss how different chemical structures might achieve similar property values?

License

The license is available in the License.txt file in this GitHub repository.

The original dataset is generously provided by Professor Thierry Langer from the University of Vienna. The dataset has been made publicly available under CC BY 3.0 AT Attribution 3.0 Austria license in this publication: Molecules 2021, 26, 6185. See the "Dataset_*.rights" for details.

Community Support

MATLAB Central

Copyright 2024 The MathWorks, Inc.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published