Cluster Analysis by Fingerprint (ECFP)

What's the BEST method to interpret molecules so that computer can understand. I know this sentence itself is somewhat... BUT this is what I would like to do. Isn't this what we all want to do!?

Well, so far in 2019, April, I would say ECFP would be the one (Yes, I accept any comments. I would want to comment to this myself even).

 

BUT, let's cluster by ECFP coz its fun!

f:id:hateknime:20190416221855p:plain

So, what did I do...

File Reader: Reads csv

Column Filter: Delete unnecessary columns

RDKit Fingerprint: I love RDKit, its the best! Morgan FP by default.

Expand Bit Vector: This, I dunno If I need this, there's gotta be a smarter way

k-Means: I just like this to start off with (clustering to 10 groups)

 

It's all cool and so easy by KNIME. then copy and pasted into previous workflow where I clustered by descriptors.

f:id:hateknime:20190416221902p:plain

 

SO! what's the difference between Desriptors to PCA and Fingerprint

f:id:hateknime:20190416221847p:plain

Ok, quite different. So I guess how you express molecules would really change the result. Well, not so far off though. This is interesting.

For a machine learning I usually start off by ECFP and SVM or RF. Maybe I'll try that for another fun. I can do all these without any programming/scripting. KNIME is so cool .