Khan Siddiqui, MD’s Post

Healthcare visionary leading HOPPR's multimodal AI revolution

9mo

🚀Key Takeaways from "An Image is Worth 1/2 Tokens After Layer 2"🚀 The study reveals inefficiencies in attention computation over visual tokens in Large Vision-Language Models (LVLMs). FastV, a versatile plug-and-play method, optimizes computational efficiency and significantly reduces costs without sacrificing performance across image and video tasks. Fine-tune the trade-off between computational efficiency and performance with FastV. This customizable approach allows for superior performance while compressing models for deployment on edge devices and commercial use. Read more about this innovative approach to AI, machine learning, and computer vision in the full article here: https://lnkd.in/gJ-DQ46M #AI #MachineLearning #ComputerVision #Efficiency #Innovation

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

arxiv.org

To view or add a comment, sign in

More Relevant Posts

Sanjay Kariyappa

Sr. Research Scientist at NVIDIA working on AI Security and Privacy
7mo
Report this post
Excited to announce that our paper, "Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions," has been accepted to #ICML2024! In this work, we introduce a novel explainable AI (XAI) framework called "Progressive Inference". Our approach leverages intermediate predictions from decoder-only Transformer models to generate high-quality SHAP-like input attributions for sequence classification tasks. Our method significantly outperforms prior XAI techniques and is a step towards improving the trustworthiness of large language models. For more details, check out our arXiv preprint: https://lnkd.in/gC3z3t2J Joint work with Freddy Lecue, Saumitra Mishra, PhD, Chris Pond, Daniele Magazzeni and Manuela Veloso. #AI #MachineLearning #XAI #ICML2024

Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

arxiv.org

4 Comments
Like Comment
To view or add a comment, sign in
I. Can DIKMEN, PhD.

R&D
1w
Report this post
🚀 Smarter Transportation is Here! 🌍 We’re happy to share our work on using Large Language Models (LLMs) to improve transportation. Our study shows how LLMs can work with time-series, audio, and video data to make quick and smart decisions. 📊 What We Achieved: ✅ Built a system that works with different kinds of data. ✅ Reached 91.33% accuracy, doing best with time-series data. ✅ Made processes easier by using one model instead of many. 🚗 This system helps with tasks like planning and keeping vehicles in good shape, making transportation faster and safer. This was a team effort with my great co-authors: Dexter Le, Aybars Yunusoğlu, Karn Tiwari, and Murat Can Işık. Thanks to them for their hard work! Check out our paper here: https://lnkd.in/dAqvd_MH What are your thoughts on using AI for smarter transportation? Let us know! #AI #Transportation #Innovation #Research

Multimodal LLM for Intelligent Transportation Systems

arxiv.org

2 Comments
Like Comment
To view or add a comment, sign in
Freddy Lecue

AI Research Director @J.P. Morgan | Inventor (50+ patents) | Scientist (100+ papers - 34 H-index) | Engineer (15+ AI systems) | Speaker
7mo
Report this post
Great step towards #trustworthy #LLMs, which are crucial for LLMs adoption at scale, specially when critical decisions are required to be made (#Finance domains…). Plenty of direct applications e.g., LLMs inputs attribution for #hallucinations detection, #FactChecker !! Even broader: think LLMs outputs now directly attributing LLMs inputs. Great work, Sanjay Kariyappa and team. LLMs inputs and outputs nicely connected with #XAI. #TrustworthyAI

Sanjay Kariyappa

Sr. Research Scientist at NVIDIA working on AI Security and Privacy
7mo

Excited to announce that our paper, "Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions," has been accepted to #ICML2024! In this work, we introduce a novel explainable AI (XAI) framework called "Progressive Inference". Our approach leverages intermediate predictions from decoder-only Transformer models to generate high-quality SHAP-like input attributions for sequence classification tasks. Our method significantly outperforms prior XAI techniques and is a step towards improving the trustworthiness of large language models. For more details, check out our arXiv preprint: https://lnkd.in/gC3z3t2J Joint work with Freddy Lecue, Saumitra Mishra, PhD, Chris Pond, Daniele Magazzeni and Manuela Veloso. #AI #MachineLearning #XAI #ICML2024

Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

arxiv.org
Like Comment
To view or add a comment, sign in
Dr. Kevin Yam

AI Strategist and Advisor | Ph.D. in Theoretical Physics
6mo
Report this post
The Advent of a New Epoch in GenAI? The OutEffHop Models Outperforming Conventional Attention Mechanisms in Transformers A revolutionary shift in AI is underway with the introduction of Outlier-Efficient Modern Hopfield (OutEffHop) models, which are challenging the traditional attention mechanisms in transformer architectures. Traditional attention has been fundamental in AI for tasks like language processing, but it falters when dealing with outliers. OutEffHop models leverage theoretical physics to improve data processing, treating outliers as an energy minimization problem, thereby increasing efficiency and reducing computational demands. This novel approach aligns AI more closely with the fundamental laws of physics, potentially leading to better real-world understanding in AI systems. The OutEffHop's adept handling of outliers enhances the signal-to-noise ratio, providing more accurate and reliable data insights. As these models gain traction, they could diminish the importance of traditional attention in large transformer models, marking a significant leap forward in AI capabilities. Businesses and researchers are prompted to investigate and embrace OutEffHop to maintain a competitive edge in technological innovation. The OutEffHop architecture heralds a smarter and more sophisticated generation of AI systems, characterized by intuition, efficiency, and robustness. For those interested in the technical details and potential impact of OutEffHop models, the full paper is linked in the post and warrants attention for its significance in advancing AI research and applications. #ai #llm #genAI #machinelearning #generativeAI #artificialintelligence #tech https://lnkd.in/evfd7pGa

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

arxiv.org
Like Comment
To view or add a comment, sign in
Bo Long

Monetization AI at Meta
8mo
Report this post
Our team will present our latest graph learning work, "VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections," at ICLR 2024 in Vienna, Austria. The paper introduces a novel approach for mini-batch based graph transformers, enabling them to efficiently process billion-scale industry graph datasets. This advancement opens new possibilities for research into large graph foundation models and has potential applications in fields like social network analysis and graph-based recommendation systems. We eagerly anticipate engaging discussions at ICLR 2024. Additionally, we invite you to visit us at the AI at Meta booth to explore more exciting AI advancements from Meta. @Dongqi Fu, Zhigang Hua, Yan Xie, Jin Fang, Si Zhang, Kaan Sancak, Hao Wu, Andrey Malevich, Jingrui He

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

arxiv.org
Like Comment
To view or add a comment, sign in
Marktechpost Media Inc.

5,840 followers
1mo
Report this post
Can You Turn Your Vision-Language Model from a Zero-Shot Model to Any-Shot Generalist? Meet LIxP, the Context-Aware Multimodal Framework Contrastive language-image pretraining has emerged as a promising approach in artificial intelligence, enabling dual vision and text encoders to align modalities while maintaining dissimilarity between unrelated embeddings. This innovative technique has produced models with remarkable zero-shot transfer capabilities, demonstrating significant potential in complex computational tasks. Read the full article: https://lnkd.in/duBaZr33 Paper: https://lnkd.in/d4KXs8fn

Can You Turn Your Vision-Language Model from a Zero-Shot Model to Any-Shot Generalist? Meet LIxP, the Context-Aware Multimodal Framework

https://www.marktechpost.com
Like Comment
To view or add a comment, sign in
Deci AI (Acquired by NVIDIA)

14,008 followers
8mo
Report this post
It’s day 4 of our 11 Days of Inference Acceleration Techniques. Today, we’re moving on to runtime level optimization best practices. 𝐅𝐨𝐮𝐫𝐭𝐡 𝐭𝐢𝐩: 𝐓𝐚𝐤𝐞 𝐚𝐝𝐯𝐚𝐧𝐭𝐚𝐠𝐞 𝐨𝐟 𝐠𝐫𝐚𝐩𝐡 𝐜𝐨𝐦𝐩𝐢𝐥𝐚𝐭𝐢𝐨𝐧 📈 Graph compilers such as TVM, Tensor-RT, and OpenVino work by getting a computation graph of a specific model and generating an optimized code adjusted for the target hardware. Graph compilation can optimize the graph structure by merging redundant operations, performing kernel auto-tuning, enhancing memory reuse, preventing cache misses, and more. But few things to be aware of: 📝 Not all models compile equally. 📝 The impact of compilation on model performance can vary. Hence, make sure to check the architecture's ability to compile early on in the process to avoid wasting time and resources on training a model that can’t be optimized for fast inference. – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision
Like Comment
To view or add a comment, sign in
Kshitiz Agarwal

Data Scientist at Nagarro
7mo
Report this post
🔍 Zero-Shot Object Detection (ZSOD): Revolutionizing Computer Vision! ZSOD is transforming the field by enabling models to identify unseen objects, but it also has its challenges. In my latest blog, I discuss innovative approaches to overcome these limitations and how we used GPT-4 Vision to enhance object detection. Check it out for a deep dive into this cutting-edge technology! #genai #computervision #ZeroShotObjectDetection

Different Approaches to Object Detection Using GPT-4 Vision

medium.com
Like Comment
To view or add a comment, sign in
Sapien

3,107 followers
7mo
Report this post
Unified model architecture is a flexible framework that handles multiple tasks like vision, language, and audio using a single neural network. Unlike specialized models, it uses shared architecture and abstract representations for all tasks. Best practices involve training on diverse datasets and using sequence-to-sequence formats. Quality data labeling is important for optimal performance. Learn more about unified models and Sapien's data labeling for AI models: https://lnkd.in/gpJmMi-W #MachineLearning #AIResearch #DataScience
Like Comment
To view or add a comment, sign in
Sandro V.

Product Manager | M.Sc. CompSci @ Georgia Tech | JLPT N3 | ServiceNow
6mo
Report this post
Considering the price ratio (1 : 14.3) and the TDP ratio (estimated in 1 : 20) for M1 max vs workstations with H100, these results seem promising and make me believe I will see ubiquitous robotics in my life span.
Yunfei Cheng

Machine Learning Engineer @ Apple
6mo Edited

How does MLX on Metal perform in handling machine learning tasks? Yi Wang and I conducted a set of benchmarks using M1 Max, M2 Ultra with MLX, A100, and H100 with PyTorch to compare the performance of two fundamental operations, SDPA and Linear Projection. A surprising revelation is the close performance between the M2 Ultra and A100, underscoring the impressive potential of on-device machine learning. The benchmark also reveals distinct performance trends. Linear Projection shows a linear increase in latency with larger input sizes, while SDPA exhibits exponential latency growth due to its higher complexity. Interestingly, the performance disparity in SDPA is much less pronounced than in Linear Projection. For instance, Linear Projection demonstrates a nearly 100x performance difference between the M1 Max and H100, whereas SDPA shows only 25x difference on the same set of hardwares. These findings highlight the significant potential of on-device machine learning, and we look forward to further enhancements in performance, particularly with advancements in Metal.
Like Comment
To view or add a comment, sign in

19,247 followers

View Profile Follow

Khan Siddiqui, MD’s Post

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

arxiv.org

More from this author

Making the Most of RSNA 2024: Tips from a 25-Year Attendee

Embracing the Future: How AI is Transforming Radiology for the Better

From Code to Care: The AI Healthcare Evolution

Explore topics