📢 STAC-AI™ LANG6 Benchmark: Today we’re releasing four in-depth audits of virtual machines in the Paperspace cloud featuring NVIDIA A100 and H100 GPUs. These audits, utilizing the Llama-3.1-8B and 70B Instruct models, represent the first published results using the STAC-AI™ LANG6 (Inference-Only) benchmark. For interactive use cases, the STAC-AI benchmark is unique in that it relates LLM performance to user satisfaction. Interesting results from the tests featuring the H100 GPU include: - For Llama-3.1-8B-Instruct, the system was able to maintain an Output Profile of 10.7 words per second (WPS) at the maximum arrival rate tested for the EDGAR4a Data Set (44.0 requests per second). This is well above the mean, sustainable human reading rate of 4.0 WPS (with std. dev. +/- 0.85 WPS). - For Llama-3.1-8B-Instruct and the EDGAR5a Data Set, the Output Profile (4.98 WPS) at the fastest arrival rate tested (0.394 requests per second) is still above the mean reading rate cited. - For Llama-3.1-70B-instruct, the system was able to maintain an Output Profile of 6.22 words per second (WPS) at the maximum arrival rate tested for the EDGAR4b Data Set (3.33 requests per second). This is also well above the mean rate cited above. Additionally, the reports analyze variations in latency and throughput based on prompt frequency and inference request rates, providing insights for optimizing infrastructure sizing and cost trade-offs. STAC subscribers can leverage the STAC-AI LANG6 Benchmark to evaluate: 🔺Latency, efficiency, and throughput for LLM deployment sizing. 🔺Cost comparisons among public cloud, API cloud, and on-prem LLM solutions. 🔺Time-of-day and regional effects on LLM performance. 🔺SLA compliance for critical LLM workloads. Learn more about these findings and their implications for LLM infrastructure by accessing the detailed reports. https://lnkd.in/eDJPMJuA https://lnkd.in/eVHstxUS https://lnkd.in/eaHyAsjp https://lnkd.in/esbvxKjt
STAC - Strategic Technology Analysis Center
Financial Services
Hands-on research. Testing systems. Hype-dispelling discussions.
About us
STAC is a company that coordinates a community called the STAC Benchmark Council. Understanding who we are means understanding not just the company leadership, but more importantly the members of the Council and some of the many individuals from that community who contribute requirements, proposals, and code. The STAC Benchmark Council consists of over 350 financial institutions and more than 50 vendor organizations whose purpose is: 1) to conduct substantive discussions on important technical challenges and solutions in financial services, and 2) to develop technology benchmark standards that are useful to financial organizations. User firms include the largest global banks, brokerage houses, exchanges, asset managers, hedge funds, proprietary trading shops, and other market participants. Vendor firms include innovative providers of hardware, software, and cloud services.
- Website
-
https://www.stacresearch.com/
External link for STAC - Strategic Technology Analysis Center
- Industry
- Financial Services
- Company size
- 2-10 employees
- Type
- Privately Held
Employees at STAC - Strategic Technology Analysis Center
-
Jack Gidding
CEO @ STAC ► Ex-Head of FX Technology / FXall / Trading / REDI / Wealth Management Tech / Elektron / Market Data ► Research ► Ultra Low Latency ►…
-
Eric Powers
Director, Citi Tech Fellow, Global Head of High Performance Architectures Infrastructure; STAC Fellow
-
Bishop Brock
Head of Research at STAC - Strategic Technology Analysis Center
-
Jennifer Huneck
Manager, Events & Administration at STAC
Updates
-
🚨 One week left to register for AI STAC London AI STAC agendas cover the key strategic challenges facing financial technologists building AI systems. This one starts with a discussion on AI traceability, before moving to using agentic AI in finance. We’ll then look at building lean AI toolchains, storage & network optimization for AI workloads, managing AI cloud costs, LLMs use with trading platform telemetry, and low latency machine learning. We'll wrap up the event with STAC-AI, including new benchmark results for LLM inferencing. Agenda highlights🌟 "𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗲𝘃𝗶𝗱𝗲𝗻𝗰𝗲 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻: 𝗧𝗵𝗲 𝗮𝗿𝘁 𝗼𝗳 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗻𝗴 𝗔𝗜 𝗰𝗼𝗺𝗽𝗹𝗶𝗮𝗻𝗰𝗲" – Margaret Hartnett, progressio.ai "𝗪𝗵𝗼 𝘀𝗮𝘆𝘀?": 𝗧𝗿𝗮𝗰𝗶𝗻𝗴 𝗔𝗜 𝗼𝘂𝘁𝗽𝘂𝘁𝘀 𝘁𝗼 𝘁𝗵𝗲𝗶𝗿 𝘀𝗼𝘂𝗿𝗰𝗲" – Panel with Aaron Armstrong, Margaret Hartnett, and Daniele Quercia "𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗶𝗻 𝗳𝗶𝗻𝗮𝗻𝗰𝗲: 𝗔 𝗽𝗲𝗿𝘀𝗽𝗲𝗰𝘁𝗶𝘃𝗲 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗳𝗿𝗼𝗻𝘁 𝗹𝗶𝗻𝗲𝘀" – Hanane D., Bernstein Société Générale "𝗟𝗲𝗮𝗻𝗲𝗿 𝗔𝗜: 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗻𝗴 𝗯𝗿𝗲𝗮𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝗰𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗯𝗹𝗼𝗮𝘁" – Panel with Stuart Buckland, Jim Dowling, Charles Cai, and Andreas Horn "𝗦𝘁𝗼𝗽 𝗥𝗘𝗦𝗧𝗶𝗻𝗴: 𝗛𝗼𝘄 𝘁𝗼 𝗺𝗮𝗸𝗲 𝗼𝗯𝗷𝗲𝗰𝘁 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗳𝗮𝘀𝘁 𝗲𝗻𝗼𝘂𝗴𝗵 𝗳𝗼𝗿 𝗳𝗶𝗻𝗮𝗻𝗰𝗶𝗮𝗹 𝗔𝗜", by James Coomer, DDN "𝗣𝗿𝗼𝗽𝗲𝗿 𝗽𝗶𝗽𝗲𝘀: 𝗛𝗼𝘄 𝗻𝗲𝘁𝘄𝗼𝗿𝗸𝘀 𝗰𝗮𝗻 𝗿𝗲𝗱𝘂𝗰𝗲 𝘁𝗵𝗲 𝗰𝗼𝘀𝘁 𝗼𝗳 𝗳𝗶𝗻𝗲 𝘁𝘂𝗻𝗶𝗻𝗴 𝗮𝗻𝗱 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴", by Chris Russell, Arista Networks "𝗖𝗼𝗺𝗯𝗶𝗻𝗶𝗻𝗴 𝗟𝗟𝗠𝘀 𝗮𝗻𝗱 𝗻𝘂𝗺𝗲𝗿𝗶𝗰𝗮𝗹 𝗔𝗜: 𝗧𝗵𝗲 𝗰𝗮𝘀𝗲 𝗼𝗳 𝘁𝗿𝗮𝗱𝗶𝗻𝗴-𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝘁𝗲𝗹𝗲𝗺𝗲𝘁𝗿𝘆", by Fergal Toomey, Pico "𝗗𝗼𝗻'𝘁 𝗮𝘀𝘀𝘂𝗺𝗲 𝗚𝗣𝗨: 𝗪𝗵𝘆 𝗙𝗣𝗚𝗔 𝗶𝘀 𝗯𝗲𝘁𝘁𝗲𝗿 𝗳𝗼𝗿 𝘀𝗼𝗺𝗲 𝗠𝗟", by Andrea Suardi, Xelera Technologies 💡Expect technical deep dives on ☑️ Storage & network optimization for AI workloads ☑️ Managing AI cloud costs ☑️ Low-latency ML solutions ☑️ FPGA vs. GPU for ML workloads 📊 𝗦𝗧𝗔𝗖 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 Our Head of Research, Bishop Brock, will present the latest results from the STAC-AI benchmark on LLM inferencing. Plus we’ll have Innovation Roundups from AMD, Hewlett Packard Enterprise, Hammerspace, Lenovo, and more sharing insights on optimizing AI infrastructure. 👉 Register https://lnkd.in/eCwd4Q5a #AI #Finance #STACLondon #MachineLearning
-
62% of developers are planning to deploy an LLM application to production within the next year, according to a recent survey. Here are 9 questions the STAC-AI inferencing benchmark and test harness can help you answer if you’re building an LLM-based system. 1. How quickly can your system load and get ready to use a large language model? 2. What’s the trade-off between how fast users send requests and how quickly the model responds? 3. Can the system keep up with how fast people read during real-time interactions? 4. How does the system's performance change with different context sizes? 5. What’s the highest number of requests the system can handle while maintaining an acceptable level of performance? 6. How does the system manage multiple users at once, and when should we add more GPUs to keep up with demand? 7. How closely do the model's responses match a reference set of answers? 8. How much text does the system produce for every dollar spent? 9. How does the system perform in terms of energy use, space, and cost efficiency? To demonstrate this, we ran the benchmark on a stack with: - Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct - 8 x NVIDIA A100-SXM4-80GB GPUs - 720 GiB of virtualized memory The benchmark uses LLMs to analyze financial data from quarterly and annual reports filed by publicly traded companies, illustrating how latency and throughput vary with request rates. This analysis raises important questions about user satisfaction at different times of day and the trade-offs between resource allocation and the cost of inference. The benchmark captures the following metrics plus a whole lot more: - Inferences per second - Words generated per second - Response smoothness - Energy efficiency (words per kWh) - Space efficiency (words per cubic foot of rack space) - Price Performance (words per USD) Here are just some of our findings from the Llama 3.1-8B-Instruct test: - The server loaded the model from storage into the GPUs and was ready for inference in 90 seconds. - A 26% reduction in the rate of prompts hitting the system reduced the median response time by 38%. - The system was 20x more efficient when processing a smaller-context data set than when processing a larger-context dataset. (EDGAR4a vs EDGAR5a datasets in STAC-AI) - At a peak request rate of 21.5 requests per second, the system achieved an output profile of 12.2 words per second, just below the typical maximum reading speed of 13 words per second for fast readers. We will share even more data from these tests at AI STAC conference in London on December 4th. If you’re interested in the finer details, we’ll be presenting these along with more information about STAC-AI at an AI workshop the following day. Register for either or both events if you want to learn more. They’re free to attend. 📅 Conference, December 4th 🔗 https://lnkd.in/eCwd4Q5a 📅 Workshop, December 5th 🔗 https://lnkd.in/eTKqfn6X
-
Can traders get quantitative insights from LLMs that they can stake their strategies on? Yes, but not out-of-the box. Jean-Philippe Rezler, Global Head of AI and Analytics at Opensee will provide his insights at the STAC Summit London on the limitations of off-the-shelf models when it comes to handling vast trading datasets with complex relational structures and propose a solution to overcome these challenges via a multi-agent architecture designed to ensure that quantitative analytics from LLMs can reliably be used for their strategies. Register here to hear his talk: 🎙️ “Precision in chaos: Leveraging GenAI multi-agent systems for trade analytics” 📅 3 Dec 2024 🔗 Register: https://lnkd.in/eUNWFquj
-
As AI continues to revolutionize the finance industry, understanding its practical applications from those on the front lines is crucial. How can AI agents transform financial decision-making? At the upcoming AI STAC conference in London on December 4th, Hanane D., Director and Algorithmic Trader at Bernstein Societe Generale, will present a talk entitled "AI agents in finance: A perspective from the front lines". Drawing from her experience in developing AI-driven approaches, Hanane will ✅ Explain what AI agents are and when they should be used ✅ Discuss agentic design patterns ✅ Examine open-source and proprietary frameworks for building agentic systems. Register today to discover how AI agents could become your secret weapon in the high-stakes world of trading and investment management. 🔗 Registration: https://lnkd.in/eCwd4Q5a
-
💡Looking for a way to speed up ML development? ML development today is often fragmented: a researcher working in a hosted notebook with sampled data hands off to an ML engineer, who then repackages the work for an orchestrator to handle execution. At the AI STAC in New York on December 10th, Paul Yang from Runhouse will present his talk, “A Lean Orchestration Manifesto”. Paul will argue for a different, "lean orchestration" approach to managing ML workflows, which he believes restores the development flywheel. In this pattern, code that defines an ML program is serverless and runs identically from a researcher’s local IDE and an orchestrator task in production. The orchestrator's role is limited to scheduling and monitoring execution. Paul will explain this pattern in detail and how it provides much of the reproducibility, developer experience, and fault tolerance that so many ML pipelines lack. 🔗 Register now for December 10th: https://lnkd.in/eUSdEW7P
-
Recent advancements in GenAI are transforming document classification in clinical research. At AI STAC in New York on December 10th, Nataraj Dasgupta, VP Engineering at Syneos Health and former technologist at a major bank, will present a novel methodology for auto-classifying clinical trial documents using GenAI. He will demonstrate how this approach, which combines contextual RAG, fine-tuned models, and prompt optimization with columnar databases and highly parallel processing, is achieving a remarkable 1000x reduction in both time and cost during initial testing. And he’ll discuss its applicability in finance. 🔗 Register: https://lnkd.in/eUSdEW7P
-
STAC - Strategic Technology Analysis Center reposted this
The AI arms race is on, but it’s not just about the latest models; it's about creating the right architectural backbone that can scale intelligently, minimize latency, and optimize computational efficiency. In my role at STAC - Strategic Technology Analysis Center, I’m fortunate to have eye-opening conversations with industry leaders and practitioners that are shifting my perspective on AI infrastructure. What I’ve learned from these conversations: 👀 You’ll hit compute bottlenecks if you rely solely on existing hardware. 👀 Traditional network architectures are struggling to maximise GPU utilisation. 👀 There’s still a lot of room to improve how basic mathematical operations are executed at the hardware level. Building future-proof AI infrastructure is about understanding and leveraging three emerging paradigms: 1️⃣ Heterogeneous computing: The days of one-size-fits-all hardware are over. We're witnessing a shift towards specialized architectures optimized for specific parts of AI workloads. 2️⃣ AI-specific network fabrics: Firms fine-tuning or training models are investing heavily in GPUs. But a crucial enabler is often overlooked: the network. When network architecture respects AI workloads' unique requirements, GPU utilisation rates can soar. 3️⃣ Algorithmic hardware transformation: The boundaries between mathematical algorithms and hardware implementation are blurring, leading to more efficient and specialized computational approaches. These infrastructure patterns are rapidly reshaping how we build, deploy, and scale AI systems, and we’ve got a full workshop track dedicated to them at our developer event on December 5th in London. Product and technology leaders from AMD, Xelera Technologies, Intel Corporation and Arista Networks are leading sessions on the fundamental building blocks of modern AI infrastructure, and practitioners from leading investment banks, hedge funds, exchanges and asset managers will be there. The workshops take place the day after the AI STAC conference main agenda and are a great way to dive deeper into the topics covered on day 1. Workshop tracks: ▶️ AI infrastructure ▶️ AI data management ▶️ AI applications Curious? You can register here: https://lnkd.in/eqDNTPUM Stay tuned - tomorrow, I'll be sharing more details about workshop track 2.
-
🤔Can IT trading ops benefit from AI? Fergal Toomey believes that integrating LLMs with recent advancements in deep learning for time-series analysis offers a transformative approach to automating operations management—enabling more effective detection and resolution of performance issues. Register here to hear his insights: 💬 “Combining LLMs and numerical AI: The case of trading-platform telemetry” 📅 4 Dec 2024, AI STAC London 👉 https://lnkd.in/eMunJWjC Pico #AISTAC #ai #machinelearning #hft #trading #financialmarkets #marketdata #Analytics #FinTech #LLM #deeplearning
-
“Where did the AI get the data for that response?” This question of traceability is increasingly critical as AI solutions gain access to broader training and augmentation datasets and become better at using tools. This need extends from an organization's data scientists to managers, regulators, and even the public. Join us at AI STAC in London (December 4) or New York (December 10) for a panel titled "Who says?": Tracing AI outputs to their source. London Panelists: ▪️ Margaret Hartnett, Co-Founder, Progressio.ai ▪️ Aaron Armstrong, Partner, Intellimation.ai ▪️ More speakers to be announced New York Panelists: ▪️ Deepak Dube, PhD., CEO, EazyML ▪️ Niamh O'Connell, Enterprise Lead, Prove AI ▪️ Pete Harris, Partner, Lighthouse Partners These experts will focus on: ✅ Framing questions of traceability in a productive way ✅ Sharing lessons learned from deploying traceable AI systems at scale ✅ Offering advice on cutting-edge approaches to ensure AI systems provide the right answers for the right reasons, compensate for biases in underlying datasets, and respect IP. Anyone responsible for AI in the finance industry is welcome to attend. Register now to secure your spot. 🔗 London: https://lnkd.in/eCwd4Q5a 🔗 New York: https://lnkd.in/eUSdEW7P