Dave's Blog

From Theory to Code: A Deep Dive into Molecular Extended-Connectivity Fingerprints (ECFPs) with Python

Source: https://docs.chemaxon.com/display/docs/extended-connectivity-fingerprint-ecfp.md What Are Molecular Fingerprints and ECFPs? Molecular fingerprints are representations that encode key functionalities and properties of chemical compounds. They were originally designed for substructure search in databases, but later gained popularity for similarity searching and molecule clustering. Extended-connectivity fingerprints (ECFPs) are a type of molecular fingerprint specifically designed for predicting and analyzing molecular activity and properties. First introduced in 2000, they have since been widely adopted across various fields. (Interestingly, they are conceptually similar to convolutional operations: like the convolution operator in CNNs that apply the same function in all directions before global pooling, ECFPs use iterative circular

Read More »

Emerging Trends and Systems Implications of Multi-Modal AI Models

Source: https://arxiv.org/abs/2312.14385 Introduction As generative AI continues to advance, models are evolving beyond text generation to include image and video synthesis capabilities. However, these multi-modal models come with unique systems-level challenges compared to traditional language models. A new paper from researchers at Meta and Harvard University provides the first in-depth analysis characterizing the system performance and implications of text-to-image (TTI) and text-to-video (TTV) generative AI models. Their analysis compares two main model architectures – Diffusion-based and Transformer-based – across eight representative models on dimensions like latency, computational intensity, and component breakdown. The researchers make several key observations about the distinct

Read More »

Prefix Tuning: Lightweight Adaptation of Large Language Models for Customized Natural Language Generation

Source: https://arxiv.org/abs/2101.00190 Introduction Natural language generation (NLG) models like the GPT series have enabled remarkable progress on conditional text generation tasks such as summarization and table-to-text.  However, fine-tuning these large pretrained models on downstream NLG tasks requires updating all model parameters, which is computationally expensive. Fortunately, researchers have identified lightweight fine-tuning methods that reduce the number of parameters that must be updated when fine-tuning a model for a specific task. One such method is prefix tuning, proposed by Li and Liang (2021). Prefix tuning keeps the parameters of the pretrained model fixed, and only trains a small continuous “prefix” that

Read More »

Multimodal Few-Shot Learning with Frozen Language Models: A Review

Source: Multimodal Few-Shot Learning with Frozen Language Models Introduction Recent advances in natural language processing have led to large transformer-based language models that exhibit impressive few-shot learning abilities. Specifically, when trained at sufficient scale, these models can learn new language tasks after being prompted with just a few examples, without any further gradient updates.  However, despite these impressive capabilities, large language models are limited to only understanding text. These models cannot process inputs in other modalities like vision, which prevents us from directly communicating visual information, tasks, or concepts to the models. We cannot simply show the model an image

Read More »

RLHF Training at Scale with DeepSpeed-Chat

Source: DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales Introduction ChatGPT-like models showcase the power of large language models for conversational AI. However, training such powerful models requires massive computational resources, making them inaccessible to many researchers and developers. Microsoft’s newly open-sourced DeepSpeed-Chat aims to change that by providing an end-to-end framework for training ChatGPT-style models efficiently at any scale. DeepSpeed-Chat allows training models with hundreds of billions of parameters in record time using only commodity GPUs. This is made possible by DeepSpeed-Chat’s optimized DeepSpeed-RLHF training system. It combines innovations in memory optimization, parallelism, and other

Read More »

New Insights into the Inner Workings of In-Context Learning

Source: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Introduction In-context learning has emerged as one of the most remarkable capabilities of large language models like GPT-3 and GPT-4. With just a few demonstration examples, these models can rapidly adapt to new tasks and make accurate predictions without any parameter updates. But how does this impressive on-the-fly learning actually work behind the scenes? In a fascinating new paper from Microsoft Research and Peking University, researchers provide new theoretical insights that help unravel the optimization processes underlying in-context learning in Transformer models. By drawing parallels to gradient descent and analyzing the mechanics

Read More »

Bite-Sized Bayesian: A Mini-Tour of Bayesian Inference, from Coin Flips to Neural Networks

“Unlike deterministic Neural Networks (left) that have a fixed value of their parameters, Bayesian Neural Networks (right) has a distribution defined over them.” Source: https://sanjaykthakur.com/2018/12/05/the-very-basics-of-bayesian-neural-networks/ In our vast, interconnected, and (dare I say infinitely) complex universe, the only certainty is uncertainty. And yet, intriguingly, while we may never grasp the full picture, our minds come equipped with tools that help us navigate this world, weaving intricate (and hopefully informative) narratives from just snippets of the “grand tableau”. Bayesian inference is one mathematical formulation/representation of this innate skillset. Let’s briefly delve into the world of Bayesian inference, from the humble beginnings

Read More »

Fine-Tuning Models? Think Surgical Precision, Not Sledgehammer

When building machine learning systems, it’s common to take a model pre-trained on a large dataset and fine-tune it on a smaller target dataset. This allows the model to adapt its learned features to the new data.  However, naively fine-tuning all the model’s parameters can cause overfitting, since the target data is limited. In a new paper, researchers from Stanford explore an intriguing technique they call “surgical fine-tuning” to address this challenge. The key insight is that fine-tuning just a small, contiguous subset of a model’s layers is often sufficient for adapting to a new dataset.  In fact, they show across 7

Read More »

Demystifying AI Foundation Models: A Comprehensive Guide to Large Language Models, ChatGPT, and Beyond

Introduction With the advent of modern deep learning algorithms and architectures, AI has made remarkable progress in the realms of computer vision and language. AI can now process, translate, and generate human-grade images and text. The latest advances are occurring in the realm of language and text, where AI models are inching closer to planning and reasoning. Of course, by now you’ve probably heard of ChatGPT. ChatGPT is a type of Large Language Model (LLM), and ever since its release, it has taken the world by storm. LLMs are incredibly powerful and can perform a variety of tasks ranging from

Read More »

NYC COVID-19 Tracker

Built using Python, Plotly, and Dash. The Beautiful Soup and Requests packages were used to collect the data. Cleaning and wrangling were performed with Pandas. The data contains COVID-19 cases per NYC zip code tabulation area. The app fetches polygon geo-coordinates for each zip code area, and then plots the final data on a Choropleth map using Plotly and Dash. The app was deployed on a Heroku server. The COVID-19 data comes from NYC Health’s GitHub repository. The coordinate data comes from Iowa.gov. Visit the App

Read More »


Most Popular