Product Digest: Latch Plots, Protein Engineering and GPU-enabled Bioinformatics

Product Digest: Latch Plots, Protein Engineering and GPU-enabled Bioinformatics

We’re excited to showcase some highlights from the last month! Read on to learn more about our recent conference in Boston, our new suite of protein engineering tools, and our new visualization software for biology, Latch Plots.


⚙️ Product Updates

Protein Engineering Toolkit: A Curated Toolbox of Graphically Accessible Molecular Design Tools

Over the past few years, a new class of machine learning tools for protein engineering promise to change how we develop drugs and enzymes. However, in these early days, the industry is still looking for concrete applications for these tools. 

Based on our work with cutting-edge academic labs and computational biotech teams on real problems in research and industry, we’ve assembled a curated toolbox of graphically accessible protein engineering tools.

Using Latch Workflows, choose from over 16 rigorously-tested protein engineering pipelines.
Users can leverage diffusion models to create novel protein structures using RFdiffusion, one of the new protein engineering tools hosted on Latch Workflows.

Within Latch Workflows, users can now self-serve 16+ rigorously tested pipelines enabling structure and sequence generation, prediction, and property evaluation. 

Within moments, generate FASTA file containing 100 sequences that could plausibly fold to your binder.

To demonstrate the flexibility and breadth of utility of these tools, we published an in-depth case study using these tools to build two different proteins – plastic-degrading enzymes and blood cholesterol drugs.

Relationships between various protein engineering tools hosted on Latch.

Check out the case study HERE

Start using our Protein Engineering toolkit today by reading our Workflows Wiki Page.

Plots: A New Visualization Software for Biological Data

We released Latch Plots, a reactive, Python-based plotting framework with a library of graphical widgets and an LLM integration for self-serve analysis by bench scientists. Traditional tools like GraphPad and Excel fall short for the following reasons:

  • Integration with existing data sources: Scientists frequently struggle with importing data from platforms like GCP, S3, Drive, Excel, and Benchling, leading to inefficiencies and compromised traceability.

  • Transparency into underlying code: Access to the specifics of statistical analyses and plotting algorithms is essential for verification and customization, yet many tools keep these details inaccessible.

  • Collaboration capabilities: The lack of live editing and shared workspaces in desktop software hinders teamwork and slows down the research process.

We built Latch Plots to balance the trade-offs of usability for scientists and flexibility for developers, while maintaining traceability for the entire R&D campaigns.

Use case highlights: 

It is important to contextualize how Plots works with real biological applications:

GPU-enabled single-cell visualization using RAPIDs

Over the past decade, the number of cells per kit has accelerated rapidly, with 10X Genomics and Scale Biosciences pushing for kits with 1 million plus cells. Analyzing such vast amounts of data is an untrivial feat. On a machine with 16 CPUs, we found simple steps like pre-processing, dimensionality reduction, Leiden/ Louvain clustering can take up to 30+ minutes. 

Given that Plots is built on top of our Pods’ infrastructure, we were able to scale up resources while preserving data and dependencies to one A10G NVIDIA GPU. This cut down analysis time to <1 minute.

Use scalable compute resources to cut processing times from over 30 minutes to less than 45 seconds.

The no-code chart cell means scientists can construct UMAPs and change categorical variables to color by. This is useful for tasks such as confirming the intuition of cell types outputted by an automatic cell type annotation model.

Use the 'Color By' option to color UMAPs by various metadata, such as patient IDs and pre- or post-treatment status.

Deep mutational scanning to explore the phenotypic landscape of AAVs

Despite the promises of gene editing, delivering gene therapy to specific tissue types is a complex problem. To mock a realistic viral engineering problem, we replicated Ogden et al. paper, which comprehensively analyzed all single-codon mutants of the AAV2 cap gene across all 735 positions.

The first figure we recreated shows the fitness of each variant compared to WT in virus production efficiency. Positive selection indicates beneficial mutations enriched in the virus, likely enhancing capsid assembly or genome packaging. Negative selection suggests mutations that may impair virus production.

Fitness of each variant compared to WT in virus production efficiency.

Next, we created a heatmap for fitness for all single–amino-acid insertions, deletions (D), stop codons (*), and substitutions.

Heatmap showing the fitness scores for each amino acid type.

The authors found that mutations at different structural regions of the AAV2 capsid have distinct fitness effects. Buried residues and regions near the 5-fold axis showed strong negative selection, likely due to their role in maintaining capsid stability, while exposed residues, particularly near the 3-fold axis, were more mutation-tolerant. 

The authors studied how AAV capsid mutations impact in vivo delivery by creating a library of variants and injecting them into mice. 

To analyze the fitness scores for each tissue type, we performed PCA on the fitness scores of the mutants, applied K-means clustering to reduce dimensionality, and overlaid enrichment scores on the clusters. Higher enrichment indicated that certain AAV capsids were more effective for specific tissue types.

PCA colored by tissue enrichment.

GPU-Enabled Bioinformatics: Why Bioinformatics Will Move to Accelerated Hardware

In a recent essay, we outlined why new molecular assays, managed data infrastructure, and tailwinds from the “AI” boom, create the perfect storm for the regular use of accelerated hardware. To illustrate this point, we then released benchmarks for accelerated nf-core/methylseq for epigenetic analysis.

Why Bioinformatics code will run on GPUs

The rapid increase in data generation from sequencing-based experiments is pushing bioinformatics workflows to become a rate-limiting step in drug discovery. As a new generation of larger assay techniques come online, the impact of these GPU implementations move from irrelevant (~30MB bulk RNA-seq data) to desirable (~300GB methyl-seq data) to necessary (10M+ single cell atlases). Installing hardware, managing drivers, building and configuring CUDA programs, and optimizing parameters all require expertise and upfront time investment. At the same time, the rise of managed cloud platforms also removes traditional barriers to the widespread use of GPU-based bioinformatics. 

The recent AI craze is also commoditizing GPUs, improving the developer ecosystem and creating new infrastructure to use them at scale. Data intensive bioinformatics workflows will benefit from the new libraries, improvements in systems software, and increasing GPU memory induced by the demand for training large models.

Accelerated nf-core/methylseq for epigenetics

As a case study, we are releasing an accelerated version of nf-core/metylseq that uses the GPU aligner Arioc developed by Richard Wilton. Notice in our findings below that the improvements are large even with smaller workflows, making GPUs appealing for teams of all sizes. 

We will continue to improve bisulfite sequencing analysis, but plan to extend the benefits of GPU implementations to all large assays. WGS and single-cell methylation are in the works. We encourage biotech teams that are interested in these cost and performance benefits to reach out to our team at kenny@latch.bio to collaborate on our assay roadmap. 

Read our full essay on GPU benchmarks HERE.

Nextflow Integration Updates: Improved Developer Tools

We are constantly thinking about creative new tools that improve the experience of developing and deploying workflows built with Nextflow.

We introduced the latch develop command, which allows you to SSH into a computer with an execution’s shared filesystem mounted to it. This allows you to explore intermediate files and poke around the environment.

We also found that a lot of Nextflow workflows cost a lot more than they should. On Latch, Nextflow executions now come with a per-process usage report, giving developers visibility into resource consumption across disk, network, CPU and RAM for each process.

Built-in per-process usage report for Nextflow executions.

Start using our Nextflow Integration today by reading our Nextflow Wiki Page.


🤝 Partnership Highlights

PlotsAI Webinar: How to use LLMs to Understand Biological Data

We unveiled our newest product component, Latch Plots, at a live webinar on October 30th with our Head of Product, Hannah Le! During this live event, she demonstrated how scientists can use Latch Plots to generate publication-ready visualizations to extract insights from their own data. 

The best part – our built-in LLM allows scientists to generate plots and interactive widgets simply by describing what they need. The LLM then interprets the input, retrieves the appropriate data, and creates interactive plots that update dynamically based on user selections. 

And all of these visualizations are customizable and traced to the source data.

Hannah Le, Head of Product at LatchBio, demonstrating use cases for our new feature, PlotsAI, at our live webinar.

Watch the on-demand webinar HERE.

Data Infrastructure for Biotech Conference

On October 21st, we hosted a one-day gathering of hundreds of engineers and scientists in Boston. We were able to learn from knowledge shared by leaders at companies like Recursion, Elsie Biotechnologies, Dyno Therapeutics, and many more. 

In an industry characterized by siloes, NDAs and secrecy in the molecular details of drug programs, computer engineering gives biotechs the opportunity to come together, pool resources and share tacit knowledge. 

At our second conference on data infrastructure we had hundreds of engineers and scientists in one room to meet and learn from each other. The focus was noticeable. We intentionally curated an intellectually open forum of customers and non-customers, encouraging everyone to share ideas freely. The best methods and tools shine in the limelight of the truth and we all have a lot to learn from each other. We will continue to do this for years to come. Reach out if you're interested in being involved next year.

Stay tuned on our socials for posts highlighting topics from our speakers.

We’ll see you again in Boston next year!


📖 Additional Resources

If you want to learn more:

Read our breakdown of how biotechs use software to make drugs.

Read our recent blog about Virtual Screening for New Antibiotics: A Machine Learning Approach to LpxC Inhibition.

Read about our AI-Enabled Gene Editing Toolkit

 🪵 LatchBio Change Log

Alberto Chamorro

Automating presales | Nova | R2

1mo

Amazing work guys, just signed!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics