Chemical Space by Descriptors
So, a simple workflow but does it's job.
File Reader: Read csv file
Column Filter: Filter out non-SMILES paramters
RDKit Descriptor Calculation: Used All Descriptors!!!!
Normalizer: Gaussian (but 0 to 1 could be good too)
PCA: Into 3 Dimension!!!!
2D/3D Scatterplot: So that human can SEEEEEEEEEE
Forgot to normalize at the beginning and gave a weird looking result. Normalization is definitely necessary... Well, then gave this kind of result.
Space shown. So clustered! WTH!!
Wonder how you can differentiate this into groups.
Maybe I should try clustering next.
But really, SOM is kind of thing I want to see. Hmm... Maybe there is a way.