NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
An idiot's guide to lead optimisation for proteins (magnusross.github.io)
patrickkidger 2 hours ago [-]
Oh heck, this is awesome to see on the front page! I wrote the underlying Cradle-1 paper that is being discussed!

I used to work for Cradle and writing this paper was the last thing I did before leaving – on good terms – to found my own startup. :D And we'll 100% be using Cradle for our lead optimization.

(On the off-chance: I'm at PEGS Boston this week chatting all things AI+antibodies, in particular for rare diseases. If this topic is of interest to any other protein+tech geeks here then send me an email, let's grab coffee.)

the__alchemist 25 minutes ago [-]
It sounds like this is mostly (or exclusively?) operating directly on AA seqs. I wonder what the upper limit of capability this is for the intended use case. As in, without incorporating the 3d chemistry or spacial reasoning. E.g. classical MD, DFT etc like ORCA performs etc. Of particular interest: Does this upper bound (assuming it exists; I suspect it does) preclude its utility in practical protein design/gen.

I speculate Cradle is taking the approach they are vs structural/spacial, as structure spacial models don't work very well on big molecules like proteins! (And/or are too slow; errors accumulate over space etc)

theophrastus 2 hours ago [-]
After spending an entire career doing 'by hand' (and a helluva lot of molecular orbital calculations) on the problem this post is about, i've got to tersely weigh in with: there's (still) not enough available data given the size of protein 'phase space' to hope for a proper covering with one's trained up linear algebra model. Or typed another way: you've got to include at some stage some physical modeling parameters, like molecular orbitals [1], otherwise the 'response curve' will only optimize if one gets quite lucky, (which is actually unlucky as then you'll delude yourself into thinking it's a generally applicable, which it isn't). For instance, swap in a carboxylic acid moiety where there was previously an aldehyde, a protein side-chain flips over, and you're in a completely different corner of the energetic 'galaxy'.

[1] e.g. https://proteindf.github.io/

phreeza 2 hours ago [-]
That seems possible for generating completely new proteins.

Do you think it's also the case for lead optimization where you typically have some degree of measurements around your starting point, and you are expecting to stay in that local neighborhood for the generated candidates, too?

(Disclaimer: former Cradle employee here)

patrickkidger 2 hours ago [-]
Oh hello Thomas, fancy seeing you here :D ex-Cradlers unite!
patrickkidger 2 hours ago [-]
I'll offer a +1 to the sibling comment here.

Yeah it's totally true you can't build a one-size-fits-all foundation model, the data just isn't there. But also... no-one needs that. It's totally fine to tweak a foundation model for any individual problem, and that's the bulk of what is being described in the linked blog post / in the underlying paper.

FWIW whilst at Cradle we had a lot of doubts going into this. Like, thermostability is clearly evolutionarily correlated so it was always pretty likely that by hook or by crook the models could do that correctly. But, binding? Aggregation? Not at all clear that the same principles should hold. And the exciting finding was that yes, yes they do.

dnautics 35 minutes ago [-]
how many therapeutic proteins are there that aren't mabs or ~naturally occurring proteins (insulin, modified insulins, hirudin, cerezyme etc)?

I can think of:

etanercept

4 hours ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 18:03:33 GMT+0000 (Coordinated Universal Time) with Vercel.