Kshitiz Agarwal’s Post

Data Scientist at Nagarro

7mo

🔍 Zero-Shot Object Detection (ZSOD): Revolutionizing Computer Vision! ZSOD is transforming the field by enabling models to identify unseen objects, but it also has its challenges. In my latest blog, I discuss innovative approaches to overcome these limitations and how we used GPT-4 Vision to enhance object detection. Check it out for a deep dive into this cutting-edge technology! #genai #computervision #ZeroShotObjectDetection

Different Approaches to Object Detection Using GPT-4 Vision

medium.com

To view or add a comment, sign in

More Relevant Posts

Yogesh Kumar

Data Scientist at Nagarro | Machine Learning | GenAI
7mo
Report this post
🔍 Zero-Shot Object Detection (ZSOD): Revolutionizing Computer Vision! ZSOD is transforming the field by enabling models to identify unseen objects, but it also has its limitations. In my latest blog, I discuss innovative approaches to overcome these limitations and how we used GPT-4 Vision to enhance object detection. Check it out for a deep dive into this cutting-edge technology! #genai

Different Approaches to Object Detection Using GPT-4 Vision

medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Charles Manson

AI Scientist
6mo Edited
Report this post
ARC-AGI and the problem with LLMs (even multi-modal models created with the same architecture) Francois Chollet (of Keras fame) and others have put forward a $1 million prize pool on Kaggle (https://lnkd.in/ePQ6VC6n) to support the advancement of efforts towards AGI due to the weakness of the current SOTA models to generalize outside their training set in terms of reasoning and abstraction abilities. The dataset (https://lnkd.in/ec3_fU5U) features relatively simple (someone of 1 sigma should be able to solve >95% of problems and someone of >2 sigma should be able to achieve 100%) visual IQ problems that work on reasoning/abstraction. This highlights one of the issues with current LLMs that they are more akin to giant interactive associative recall structures than anything else; the reasoning and abstraction they exhibit is in very large part hallucinated from the training set. The human like responses act to deceive many users into thinking they are interacting with a human like reasoning system but they would be mistaken in that belief as the above and following illustrates. Indeed recent research allows the exploration of LLMs as interactive giant associative stores by tracing the location of particular knowledge to particular critical MLP (not attention) layers using causal mediation analysis (Locating and Editing Factual Associations in GPT, Meng et al.). Each of the critical MLP layers stores a portion of the memory. Causal mediation analysis works by noising the input and then restoring the state of selected activations to their clean value and thereby observing which layers have the strongest effect on producing the original output. This even allows knowledge/memory editing (MEMIT that computes a delta with the desired output by gradient descent and then spreads the change over the mediating layers for instance) of LLMs. It is perhaps much more instructive to think of LLMs in this way as interactive giant and highly complex associative stores of knowledge from their training sets.

ARC Prize 2024 | Kaggle

kaggle.com
Like Comment
To view or add a comment, sign in
Synaptics Incorporated

57,421 followers
9mo
Report this post
Ready to get trained on the latest AI technology? We’re looking forward to this #EW24 in-depth AI training session given by our own Abdel Younes, Director of Machine Learning Frameworks & Apps. Seats are full for the session on April 9, but email press@synaptics.com for information on in-person or virtual training. The training is on secure AI inference for embedded vision platforms and will focus on real world Edge AI applications for video analytics such as object detection, image classification and AI upscaling. In these cases it is critical that the data be secured, to protect personally identifiable information (PII) to satisfy GDPR and to meet DRM/CAS requirements. These Edge AI vision applications often use an open model exchange format like TFLite or ONNX and the Edge AI engines are often required to execute on clear data. However, in the real embedded world, some video data will be encrypted. In this world, developers need to be concerned about secure pre- and post-processing of the data like image format conversion, cropping, or rescaling; about avoiding buffer copies as much as possible; and, most importantly, about protecting the data with a secure inferencing framework. This session examines the problem of secure AI inference and will show how it can be resolved effectively. If you’re attending Embedded World we hope you can join us to cover this important topic and more: https://lnkd.in/eDeUM7-m #edgeai #embeddedvision

Embedded World 2024 | Synaptics

synaptics.com

2 Comments
Like Comment
To view or add a comment, sign in
Deci AI (Acquired by NVIDIA)

14,008 followers
8mo
Report this post
It’s day 4 of our 11 Days of Inference Acceleration Techniques. Today, we’re moving on to runtime level optimization best practices. 𝐅𝐨𝐮𝐫𝐭𝐡 𝐭𝐢𝐩: 𝐓𝐚𝐤𝐞 𝐚𝐝𝐯𝐚𝐧𝐭𝐚𝐠𝐞 𝐨𝐟 𝐠𝐫𝐚𝐩𝐡 𝐜𝐨𝐦𝐩𝐢𝐥𝐚𝐭𝐢𝐨𝐧 📈 Graph compilers such as TVM, Tensor-RT, and OpenVino work by getting a computation graph of a specific model and generating an optimized code adjusted for the target hardware. Graph compilation can optimize the graph structure by merging redundant operations, performing kernel auto-tuning, enhancing memory reuse, preventing cache misses, and more. But few things to be aware of: 📝 Not all models compile equally. 📝 The impact of compilation on model performance can vary. Hence, make sure to check the architecture's ability to compile early on in the process to avoid wasting time and resources on training a model that can’t be optimized for fast inference. – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision
Like Comment
To view or add a comment, sign in
Sapien

3,106 followers
7mo
Report this post
Unified model architecture is a flexible framework that handles multiple tasks like vision, language, and audio using a single neural network. Unlike specialized models, it uses shared architecture and abstract representations for all tasks. Best practices involve training on diverse datasets and using sequence-to-sequence formats. Quality data labeling is important for optimal performance. Learn more about unified models and Sapien's data labeling for AI models: https://lnkd.in/gpJmMi-W #MachineLearning #AIResearch #DataScience
Like Comment
To view or add a comment, sign in
Nigel Kendrick

Hands-on IT, IT Service Management and Service Improvement
7mo
Report this post
Somewhere else on the InterWebs, I posted a technical query about messaging protocols relating to sensors and devices using MQTT and the low-power Zigbee wireless standard. (I am doing some solution architecting on integrating and securing a smart office system into the main business network infrastructure). One helpful person posted an AI generated answer - and it was so verbose, and WRONG at the same time, it was embarrassing to read. Setting aside all the current arguments about intellectual property, copyright etc (which DO deserve serious attention), generative AI needs to progress beyond the 'string together plausible, authoritative answers that just happen to be bullshit'. Some systems are getting there, and others are just pants. Oh, and will people stop slapping the term 'AI' on any code that's doing fast computation/statistical analysis - it's getting beyond tedious. Well, that's my daily rant over. https://xkcd.com/2173/

Trained a Neural Net

xkcd.com
Like Comment
To view or add a comment, sign in
Khan Siddiqui, MD

Healthcare visionary leading HOPPR's multimodal AI revolution
9mo
Report this post
🚀Key Takeaways from "An Image is Worth 1/2 Tokens After Layer 2"🚀 The study reveals inefficiencies in attention computation over visual tokens in Large Vision-Language Models (LVLMs). FastV, a versatile plug-and-play method, optimizes computational efficiency and significantly reduces costs without sacrificing performance across image and video tasks. Fine-tune the trade-off between computational efficiency and performance with FastV. This customizable approach allows for superior performance while compressing models for deployment on edge devices and commercial use. Read more about this innovative approach to AI, machine learning, and computer vision in the full article here: https://lnkd.in/gJ-DQ46M #AI #MachineLearning #ComputerVision #Efficiency #Innovation

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

arxiv.org
Like Comment
To view or add a comment, sign in
M. Umar Khan

Data Scientist | Principal Investigator | Trainer
2mo
Report this post
🌐 Exploring Breadth-First Search (BFS) – A Pathfinding Essential! In the world of algorithms, Breadth-First Search (BFS) stands out as a fundamental tool for traversing or searching through graph data structures. Whether you’re navigating a maze, finding the shortest path in a network, or exploring social connections, BFS is your go-to solution for a methodical, layer-by-layer exploration. 💡 🔍 How does BFS work? Starting from a source node, BFS systematically visits all adjacent nodes before moving deeper, creating a "wavefront" effect as it progresses outward. This ensures that it discovers the shortest path in an unweighted graph and is ideal for cases where reaching the destination quickly is essential. ✨ Applications of BFS: Shortest Path in Unweighted Graphs: From network routing to puzzle solving. Connected Components in Networks: Analyzing clusters in social or web networks. AI Pathfinding: Common in game development and robotics for obstacle navigation. Mastering BFS helps not only in algorithmic challenges but also in tackling real-world problems where connectivity and structure matter! 📊 Follow us for more insights on algorithms and AI! www.drmukhan.com #BFS #Algorithms #Pathfinding #AI #DataStructures #TechExplained
Like Comment
To view or add a comment, sign in
Gayathri G

AI/ML Workshop Speaker | ML Engineer-1 @Optisol Business Solutions | GenAI Consultant | Specialising in Machine Learning, Generative AI, and Data Science
8mo Edited
Report this post
This week releases in AI: Mistral released their new model- Mistral 8*22B model 65 context length, 130B parameters,2 active experts,56 layers, would require ~260GB VRAM in fp16 73GB in bnb, uses RoPE, 32000 vocab size. Google released codegemma and Recurrent Gemma: RecurrentGemma stands out as a technically distinct model that harnesses recurrent neural networks and local attention mechanisms to enhance memory efficiency. Despite delivering comparable benchmark scores to the Gemma 2B model, RecurrentGemma's distinctive architecture offers several benefits, including reduced memory usage, increased throughput, and advancements in research innovation.

RecurrentGemma model overview | Google AI for Developers

ai.google.dev
Like Comment
To view or add a comment, sign in
Amarnath .C

Unlocking Innovation and Driving Impact.AI & User Experience Architect. Founder and CEO-CVThink. Optimizing websites for the future
3mo
Report this post
SORA (OpenAI) — 2024 While Video LDM compresses individual frames of the video to train an LDM, SORA compresses video both spatially and temporally. Recent papers like CogVideoX have demonstrated that 3D Causal VAEs are great at compressing videos making diffusion training computationally efficient, and able to generate flicker-free consistent videos. It is speculated that OpenAI has collected a rather large annotation dataset of video-text data which they are using to train conditional video generation models. Combining all the strengths listed below, plus more tricks that the ironically-named OpenAI may never disclose, SORA promises to be a giant leap in video generation AI models. 💢Massive video-text annotated dataset + pretraining techniques with image-text data and unlabelled data 💢General architectures of Transformers 💢Huge compute investment (thanks Microsoft) 💢The representation power of Latent Diffusion Modeling.
Like Comment
To view or add a comment, sign in

1,474 followers

17 Posts

View Profile Follow

Kshitiz Agarwal’s Post

More Relevant Posts

Explore topics