Chemical Space by Descriptors


So, a simple workflow but does it's job.

File Reader: Read csv file

Column Filter: Filter out non-SMILES paramters

RDKit Descriptor Calculation: Used All Descriptors!!!!

Normalizer: Gaussian (but 0 to 1 could be good too)

PCA: Into 3 Dimension!!!!

2D/3D Scatterplot: So that human can SEEEEEEEEEE



Forgot to normalize at the beginning and gave a weird looking result. Normalization is definitely necessary... Well, then gave this kind of result.

Space shown. So clustered! WTH!!

Wonder how you can differentiate this into groups.

Maybe I should try clustering next.

But really, SOM is kind of thing I want to see. Hmm... Maybe there is a way.