Chemical Space by Descriptors

f:id:hateknime:20190410202023p:plain

So, a simple workflow but does it's job.

File Reader: Read csv file

Column Filter: Filter out non-SMILES paramters

RDKit Descriptor Calculation: Used All Descriptors!!!!

Normalizer: Gaussian (but 0 to 1 could be good too)

PCA: Into 3 Dimension!!!!

2D/3D Scatterplot: So that human can SEEEEEEEEEE

 

 

Forgot to normalize at the beginning and gave a weird looking result. Normalization is definitely necessary... Well, then gave this kind of result.
f:id:hateknime:20190410202032p:plain

Space shown. So clustered! WTH!!

Wonder how you can differentiate this into groups.

Maybe I should try clustering next.

But really, SOM is kind of thing I want to see. Hmm... Maybe there is a way.