azure ai search
80 TopicsBoost Your Holiday Spirit with Azure AI
Here's the revised LinkedIn post with points 7 and 8 integrated into points 2 and 3: 🎄✨ **Boost Your Holiday Spirit with Azure AI! 🎄✨ As we gear up for the holiday season, what better way to bring innovation to your business than by using cutting-edge Azure AI technologies? From personalized customer experiences to festive-themed data insights, here’s how Azure AI can help elevate your holiday initiatives: 🎅 1. Azure OpenAI Service for Creative Content Kickstart the holiday cheer by using Azure OpenAI to create engaging holiday content. From personalized greeting messages to festive social media posts, the GPT models can assist you in generating creative text in a snap. 🎨 Step-by-step: Use GPT to draft festive email newsletters, promotions, or customer-facing messages. Train models on your specific brand voice for customized holiday greetings. 🎁 2. Azure AI Services for Image Recognition and Generation Enhance your holiday product offerings by leveraging image recognition to identify and categorize holiday-themed products. Additionally, create stunning holiday-themed visuals with DALL-E. Generate unique images from text descriptions to make your holiday marketing materials stand out. 📸 Step-by-step: Use Azure Computer Vision to analyze product images and automatically categorize seasonal items. Implement the AI model in e-commerce platforms to help customers find holiday-specific products faster. Use DALL-E to generate holiday-themed images based on your descriptions. Customize and refine the images to fit your brand’s style. Incorporate these visuals into your marketing campaigns. ✨ 3. Azure AI Speech Services for Holiday Customer Interaction and Audio Generation Transform your customer service experience with Azure’s Speech-to-Text and Text-to-Speech services. You can create festive voice assistants or add holiday-themed voices to your customer support lines for a warm, personalized experience. Additionally, add a festive touch to your audio content with Azure OpenAI. Use models like Whisper for high-quality speech-to-text and text-to-speech conversions, perfect for creating holiday-themed audio messages and voice assistants. 🎙️ Step-by-step: Use Speech-to-Text to transcribe customer feedback or support requests in real-time. Build a holiday-themed voice model using Text-to-Speech for interactive voice assistants. Use Whisper to transcribe holiday messages or convert text to festive audio. Customize the audio to match your brand’s tone and style. Implement these audio clips in customer interactions or marketing materials. 🎄 4. Azure Machine Learning for Predictive Holiday Trends Stay ahead of holiday trends with Azure ML models. Use AI to analyze customer behavior, forecast demand for holiday products, and manage stock levels efficiently. Predict what your customers need before they even ask! 📊 Step-by-step: Use Azure ML to train models on historical sales data to predict trends in holiday shopping. Build dashboards using Power BI integrated with Azure for real-time tracking of holiday performance metrics. 🔔 5. Azure AI for Sentiment Analysis Understand the holiday mood of your customers by implementing sentiment analysis on social media, reviews, and feedback. Gauge the public sentiment around your brand during the festive season and respond accordingly. 📈 Step-by-step: Use Text Analytics for sentiment analysis on customer feedback, reviews, or social media posts. Generate insights and adapt your holiday marketing based on customer sentiment trends. 🌟 6. Latest Azure AI Open Models Explore the newest Azure AI models to bring even more innovation to your holiday projects: GPT-4o and GPT-4 Turbo: These models offer enhanced capabilities for understanding and generating natural language and code, perfect for creating sophisticated holiday content. Embeddings: Use these models to convert holiday-related text into numerical vectors for improved text similarity and search capabilities. 🔧7. Azure AI Foundry Leverage Azure AI Foundry to build, deploy, and scale AI-driven applications. This platform provides everything you need to customize, host, run, and manage AI applications, ensuring your holiday projects are innovative and efficient 🎉 Conclusion: With Azure AI, the possibilities to brighten your business this holiday season are endless! Whether it's automating your operations or delivering personalized customer experiences, Azure's AI models can help you stay ahead of the game and spread holiday joy. Wishing everyone a season filled with innovation and success! 🎄✨136Views0likes0CommentsIntroducing Azure AI Agent Service
Introducing Azure AI Agent Service at Microsoft Ignite 2024 Discover how Azure AI Agent Service is revolutionizing the development and deployment of AI agents. This service empowers developers to build, deploy, and scale high-quality AI agents tailored to business needs within hours. With features like rapid development, extensive data connections, flexible model selection, and enterprise-grade security, Azure AI Agent Service sets a new standard in AI automation28KViews7likes1CommentCase Study: Efficient Faceted Navigation Solution Using Azure AI Search
The Microsoft Careers Portal processes 10 million job applications per year. For a tailored user experience, it uses Azure AI Search for features like hierarchical facets and dynamic filtering. Faceted navigation and filtering are vital components in modern search applications, enhancing the ability to deliver precise, contextually relevant results. E-commerce websites often utilize filters to help users refine product searches, while more advanced applications, such as those powered by Azure AI Search, extend these capabilities to support features like geo-spatial filtering, hierarchical facets, and dynamic filtering for a tailored user experience. This case study examines the use of Azure AI Search within the Microsoft Careers Portal, which processes roughly 10 million job applications annually. The study highlights the complexities of implementing multi-layered, interconnected filters and facets in an enterprise setting. By default, Azure AI Search provides counts for specified facets when a filter value is selected; however, additional customization ensures dynamic updates for filter and facet counts across unselected categories. This paper proposes an innovative approach for leveraging Azure AI Search’s existing capabilities to handle these complex requirements, offering a scalable solution applicable across diverse enterprise use cases. 1. Introduction The Microsoft Careers Portal integrates several first-party Microsoft products to deliver a seamless and user-centric experience for both internal employees and external job seekers. The portal provides various filters, including Experience, Work-Site Preference, Profession, and Employment Type, which are tailored based on user profiles to streamline the search for relevant job opportunities. Built on Azure AI Search, the portal offers advanced search capabilities such as Boolean search, exact- match, fuzzy search and semantic ranking. These features enhance the user experience by ensuring that job listings are accurate and relevant. However, when users select multiple filters across categories, the complexity increases in maintaining accurate facet counts inreal-time. Despite these challenges, Azure AI Search supports a robust faceted navigation experience, dynamically adjusting filter counts to reflect ongoing user selections with custom solution shared in this study. 2. Azure AI Search: Key Features Capabilities 2.1 Basic Concepts and features Azure AI Searchprovides a scalable, secure search platform capable of handling both traditional keyword and AI-augmented retrieval applications such as vector and hybrid search. The following are it’s key components: Comprehensive Retrieval System: Supports full-text, hybrid, and vector search within an index, with field-level faceting enabled by setting fields as ”facetable.” Advanced Query Syntax: Facilitates complex queries, including hybrid queries, fuzzy search, auto- complete, geo-search, and vector queries, enabling refined control over search functionality. Semantic Relevance Tuning: Offers semantic ranking, document boosting via scoring profiles, vector weighting, and other runtime parameters for optimizing query behavior. Language Analyzers: An analyzer is a component of full-text search engine responsible for processing strings during indexing and query execution. OData Filter Expressions: Provides granular control over filtering, with support for combining Boolean and full-text search expressions. 2.2 Filters and Faceted Navigation Azure AI Search, filters and facets provide users with a refined search experience: Faceted Navigation: Enables users to interactively filter results, such as job type or location, through an intuitive UI. Filterable Fields: These fields allow filtering operations, where fields marked as ”filterable” increase the index size. It’s recommended to disable ”filterable” for fields not used in filters to optimize performance. Example Request: Filtering for results where the BaseRate is less than 150 in a Rooms collection is illustrated in Listing 1 below: 1: POST Request for Filtering ‘BaseRate‘ in Rooms Faceted navigationis used for self-directed drilldown filtering on query results in a search app, where your application offers form controls for scoping search to groups of documents (for example, categories or brands), to support the experience. 2: Facets are specified on the query as request below 3: Faceted navigation structure is returned as below Text filters match string fields against literal strings that you provide in the filter. Unlike full-text search, there’s no lexical analysis or word-breaking for text filters, so comparisons are for exact matches only. For example: $filter=Category eq 'Resort and Spa', will only filter documents with text - 'Resort and Spa'. Approaches for filteringon text: search.in: A function that matches a field against a delimited list of strings. It is used where many raw text values need to be matched with a string field. search.ismatch: A function that allows you to mix full-text search operations with strictly Boolean filter operations in the same filter expression. It is used where we want multiple search-filter combinations in one request. $filter=field operator string: A user-defined expression composed of fields, operators, and values. It is used to find exact matches between a string field and a string value. 3. Customized Implementation for Microsoft Careers Portal 3.1 Career’s site requirement with Filter & Faceted navigation The Microsoft Careers Portal required an approach to dynamically update filter and facet counts as users interacted with the search filters. Specifically, when a user selects a filter value within one category, Azure AI Search should update facet counts across all other unselected categories. This requirement ensures that users receive accurate results reflecting the available job listings, even as filter selections evolve. For example, when a user selects ”Software Engineering” under the Profession filter, counts in related facets (such as Discipline and Work Site) are adjusted based on the available jobs in that profession. This behavior is visually demonstrated in Figure 1 below. 1: Faceted Navigation with Dynamic Filter Counts on Microsoft Careers 3.2Solution Approach The solution involves four categories of filters (A, B, C, and D). When a user selects values from Categories A and B, the system updates the facet counts across other categories as follows: Primary Query Execution: The selected values within the same category are combined with OR, and across categories with AND, to generate an accurate search result set. Updating Filter Values in Unselected Categories: Additional queries are executed for categories without selected values to retrieve updated counts. This iterative query approach ensures that unselected facets reflect the correct result counts. This approach allows the Microsoft Careers Portal to deliver a dynamic, real-time faceted navigation experience, keeping filter counts accurate and improving user satisfaction. & search queries triggered parallelly 3.3 Best Practices Learned While Implementation Custom analyzer on specific fields helps to enhance search for documents having matching keywords. For example, in job descriptions we have #hastags based keyword which are used with jobs posted during campaigns or some teams for boosting search. A custom-analyzer is invoked on a per-field basis and is recommended to use to cater dynamic search needs. Definescoring profiles cautiously: Prioritize Important Fields: Assign higher weights to fields that are more relevant to the search context. For example, the "Title" field has a higher weight compared to other fields, indicating its importance in search relevance. Use Freshness Boosts: Incorporate freshness boosts to prioritize recent content. This is particularly useful for time-sensitive data. Adjust the boost value and interpolation method based on the desired impact. For instance, a higher boost with linear interpolation is used for recency-sensitive profiles. Combine Multiple Scoring Functions: Use a combination of text weights and scoring functions to achieve a balanced relevance score. ThefunctionAggregationmethod "sum" is used to aggregate the scores from different functions. Test and Iterate: Regularly test and refine scoring profiles based on search performance and user feedback. Adjust weights, boost values, and interpolation methods as needed to improve search relevance. 3.4 Performance Evaluation A service-side performance test was conducted in a production-cloned environment at Azure Test Runner to validate the implementation under high-load conditions, with the portal supporting approximately 50,000-60,000 searches daily. Our search-service app service triggered requests directly to Azure AI Search deployed instance. Performance results are shown below: Request Per Second Filters Count Average Latency(ms) 20 RPS 1 429 30 RPS 1 635 30 RPS 21 482 30 RPS 70 712 Performance was optimized with a replica count of 1-7 and a consistent partition count of 1, with Web App SKU - S1 App Service Plan and scale-out configuration between 1-3 instances on below: (Average) CPU consumption > 70% (Average) Memory percentage > 80% (Average) HTTP Response time > 10s 3.5 Conclusion This case study demonstrates how Azure AI Search can effectively address complex requirements for faceted navigation in high-traffic, enterprise-level applications like Microsoft Careers. By enabling real- time, multi-layered filter updates, Azure AI Search not only meets but exceeds industry standards for search performance and relevance, reinforcing its position as a state-of-the-art solution for sophisticated search and retrieval needs. For developers and architects looking to implement similar capabilities, Azure AI Search provides a comprehensive platform capable of scaling to meet diverse business requirements. Contributors: Prachi Nautiyal, Pradip Takate, Farzad Sunavala, Abhishek Mishra, Bipul Raman, Satya Vamsi Gadikoyila, Ashudeep Reshi143Views0likes0CommentsBuilding an AI Assistant Using gpt-4o Audio-Preview API
Before I get into more details of using this API, I want to call out that this API is different from the gpt-4o Realtime API. Feature GPT-4o Realtime API GPT-4o Audio-Preview API Purpose Designed for low-latency, real-time conversational interactions with speech input and output. Supports audio inputs and outputs in the Chat Completions API, suitable for asynchronous interactions. Use Cases Ideal for live interactions such as customer support agents, voice assistants, and real-time translators. Suitable for applications that handle text and audio inputs/outputs without the need for real-time processing. Integration Method Utilizes a persistent WebSocket connection for streaming communication. Operates via standard API calls within the Chat Completions framework. Latency Offers low-latency responses, enabling natural conversational experiences. Not optimized for low-latency; better suited for non-real-time interactions. The steps to use this API are: 1. Capture user audio input Accept audio input from the user and add that to the request payload. While the System Message is of type 'text', the user input is of type 'input_audio'. audio_value = st.audio_input("Ask your question!") encoded_string = None if audio_value: audio_data = audio_value.read() encoded_audio_string = base64.b64encode(audio_data).decode("utf-8") st.session_state.messages.append( { "role": "user", "content": [ { "type": "input_audio", "input_audio": {"data": encoded_audio_string, "format": "wav"}, } ], } ) 2. Invoke the Chat completions endpoint: Specify the modalities that need to be supported, for e.g. through the configuration shown below, both text and audio output will be supported, using the neural voice specified. Include the function definitions to use during tool calling. completion = None try: completion = client.chat.completions.create( model=config.model, modalities=["text", "audio"], audio={"voice": "alloy", "format": "wav"}, functions=st.session_state["connection"].functions, function_call="auto", messages=st.session_state.messages, ) except Exception as e: print("Error in completion", e) st.write("Error in completion", e) st.stop() 3. Pass the output from tool calling to gpt-4o to generate an audio response: Pass the tool calling response along with the audio input from the user to gpt-4o to generate the audio response. l_completion = client.chat.completions.create( model=config.model, modalities=["text", "audio"], audio={"voice": "alloy", "format": "wav"}, messages=[ { "role": "system", "content": [{"type": "text", "text": system_prompt_response}], }, { "role": "user", "content": [ { "type": "text", "text": "---- context -----\n"+str(function_response) + "\n --- User Query----:\n", }, { "type": "input_audio", "input_audio": { "data": encoded_audio_string, "format": "wav", }, }, ], }, ], ) wav_bytes = base64.b64decode(l_completion.choices[0].message.audio.data) 4. Extract the text transcript of the response In addition to playing the audio response over the speaker, we can also populate the chat conversation with the text transcript of the audio response. transcript_out = l_completion.choices[0].message.audio.transcript st.session_state.messages.append( { "role": "assistant", "content": transcript_out, } ) Note: The gpt-4o audio-preview API is not available in Azure as of this writing. This sample uses the API from OpenAI directly. The sample application below, powered by gpt-4o audio-preview API, showcases a customer querying their gaming progress and grievance status. Tool integration is used to: Pull information from an Azure SQL database for points and achievements. Perform Azure AI Search-driven Q&A over documents and user manuals. Register and retrieve grievances via API integration with Jira Cloud. See a demo of this application in action below The code for this application is available here186Views0likes0CommentsDemo: Enriching Data in Azure AI Search Indexing Pipeline Using Azure AI LLMs/SLMs for RAG Apps
Enhance your RAG applications with enriched data context using Azure AI's LLMs/SLMs. This demo shows how to integrate custom skills in the Azure AI Search indexing pipeline, adapting prompts to improve response accuracy. Leverage Azure AI Search's indexing capabilities for efficient data transformations, with practical applications like image captioning and document summarization, tailored for more precise and relevant responses.369Views0likes0CommentsRAG Best Practice With AI Search
RAG Best Practice With AI Search Please refer to my repo to get more AI resources, wellcome to start it: https://github.com/xinyuwei-david/david-share.git This article if from one of my repo: https://github.com/xinyuwei-david/david-share/tree/master/LLMs/RAG-Best-Practice Although models like GPT-4 and GPT-3.5 are powerful, their knowledge cannot be the most up-to-date. Previously, we often introduced engineering techniques in the use of LLMs by treating prompt engineering, RAG, and fine-tuning as parallel methods. In fact, these three technologies can be combined. Four stages of RAG The thinking in the paper I read is excellent—it divides RAG into four stages. Level 1: Explicit Fact Queries Characteristics Simplicity: Directly retrieving explicit factual information from provided data without the need for complex reasoning or multi-step processing. Requirement: Efficiently and accurately retrieve relevant content and generate precise answers. Techniques and Engineering Suggestions a. Basic RAG Methods Data Preprocessing and Chunking Fixed-Length Chunking: Splitting the text by fixed lengths, which may interrupt sentences or paragraphs. Paragraph-Based or Semantic Chunking: Chunking based on natural paragraphs or semantic boundaries to maintain content integrity. : Divide long texts or documents into appropriate chunks for indexing and retrieval. Common chunking strategies include: Index Construction: Sparse Indexing: Use traditional information retrieval methods like TF-IDF or BM25 based on keyword matching. Dense Indexing: Use pre-trained language models (e.g., BERT) to generate text vector embeddings for vector retrieval. Retrieval Techniques: Utilize vector similarity calculations or keyword matching to retrieve the most relevant text fragments from the index. Answer Generation: Input the retrieved text fragments as context into the LLM to generate the final answer. b. Improving Retrieval and Generation Phases Multimodal Data Processing: If the data includes tables, images, or other non-text information, convert them into text form or use multimodal models for processing. Retrieval Optimization: Recursive Retrieval: Perform multiple rounds of retrieval when a single retrieval isn't sufficient to find the answer, gradually narrowing down the scope. Retrieval Result Re-ranking: Use models to score or re-rank retrieval results, prioritizing the most relevant content. Generation Optimization: Filtering Irrelevant Information: Before the generation phase, filter out retrieved content unrelated to the question to avoid interfering with the model's output. Controlling Answer Format: Through carefully designed prompts, ensure the model generates answers with correct formatting and accurate content. Engineering Practice Example Example: Constructing a Q&A system to answer common questions about company products. Data Preparation: Collect all relevant product documents, FAQs, user manuals, etc. Clean, chunk, and index the documents. System Implementation: After a user asks a question, use dense vector retrieval to find the most relevant text fragments from the index. Input the retrieved fragments as context into the LLM to generate an answer. Optimization Strategies: Regularly update documents and indexes to ensure information is current. Monitor user feedback to improve retrieval strategies and prompt designs, enhancing answer quality. Level 2: Implicit Fact Queries Characteristics Increased Complexity: Requires a certain degree of reasoning or multi-step derivation based on the retrieved data. Requirement: The model needs to decompose the question into multiple steps, retrieve and process them separately, and then synthesize the final answer. Techniques and Engineering Suggestions a. Multi-Hop Retrieval and Reasoning Iterative RAG: IRCoT (Iterative Retrieval Chain-of-Thought): Use chain-of-thought reasoning to guide the model in retrieving relevant information at each step, gradually approaching the answer. RAT (Retrieve and Answer with Thought): Introduce retrieval steps during the answering process, allowing the model to retrieve new information when needed. Question Decomposition: Break down complex questions into simpler sub-questions, retrieve and answer them individually, then synthesize the results. b. Graph or Tree Structured Retrieval and Reasoning Building Knowledge Graphs: Extract entities and relationships from data to construct knowledge graphs, helping the model understand complex dependencies. Graph Search Algorithms: Use algorithms like Depth-First Search (DFS) or Breadth-First Search (BFS) to find paths or subgraphs related to the question within the knowledge graph. c. Using SQL or Other Structured Queries Text-to-SQL Conversion: Convert natural language questions into SQL queries to retrieve answers from structured databases. Tool Support: Use existing text-to-SQL conversion tools (e.g., Chat2DB) to facilitate natural language to database query conversion. Engineering Practice Example Scenario: A user asks, "In which quarters over the past five years did company X's stock price exceed company Y's?" Question Decomposition: Obtain quarterly stock price data for company X and company Y over the past five years. Compare the stock prices for each quarter. Identify the quarters where company X's stock price exceeded company Y's. Implementation Steps: Step 1: Use text-to-SQL tools to convert the natural language query into SQL queries and retrieve relevant data from the database. Step 2: Use a programming language (e.g., Python) to process and compare the data. Step 3: Organize the results into a user-readable format. Answer Generation: Input the organized results as context into the LLM to generate a natural language response. Level 3: Interpretable Rationale Queries Characteristics Application of Domain-Specific Rules and Guidelines: The model needs to understand and follow rules typically not covered in pre-training data. Requirement: Integrate external rules, guidelines, or processes into the model so it can follow specified logic and steps when answering. Techniques and Engineering Suggestions a. Prompt Engineering and Prompt Optimization Designing Effective Prompts: Explicitly provide rules or guidelines within the prompt to guide the model in following specified steps when answering. Automated Prompt Optimization: Use optimization algorithms (e.g., reinforcement learning) to automatically search and optimize prompts, improving the model's performance on specific tasks. OPRO (Optimization with Prompt Rewriting): The model generates and evaluates prompts on its own, iteratively optimizing to find the best prompt combination. b. Chain-of-Thought (CoT) Prompts Guiding Multi-Step Reasoning: Require the model to display its reasoning process within the prompt, ensuring it follows specified logic. Manual or Automated CoT Prompt Design: Design appropriate CoT prompts based on task requirements or use algorithms to generate them automatically. c. Following External Processes or Decision Trees Encoding Rules and Processes: Convert decision processes into state machines, decision trees, or pseudocode for the model to execute. Model Adjustment: Enable the model to parse and execute these encoded rules. Engineering Practice Example Example: A customer service chatbot handling return requests. Scenario: A customer requests a return. The chatbot needs to guide the customer through the appropriate process according to the company's return policy. Technical Implementation: Rule Integration: Organize the company's return policies and procedures into clear steps or decision trees. Prompt Design: Include key points of the return policy within the prompt, requiring the model to guide the customer step by step. Model Execution: The LLM interacts with the customer based on the prompt, following the return process to provide clear guidance. Optimization Strategies: Prompt Optimization: Adjust prompts based on customer feedback to help the model more accurately understand and execute the return process. Multi-Turn Dialogue: Support multiple rounds of conversation with the customer to handle various potential issues and exceptions. Level 4: Hidden Rationale Queries Characteristics Highest Complexity: Involves domain-specific, implicit reasoning methods; the model needs to discover and apply these hidden logics from data. Requirement: The model must be capable of mining patterns and reasoning methods from large datasets, akin to the thought processes of domain experts. Techniques and Engineering Suggestions a. Offline Learning and Experience Accumulation Learning Patterns and Experience from Data: Train the model to generalize potential rules and logic from historical data and cases. Self-Supervised Learning: Use the model-generated reasoning processes (e.g., Chain-of-Thought) as auxiliary information to optimize the model's reasoning capabilities. b. In-Context Learning (ICL) Providing Examples and Cases: Include relevant examples within the prompt for the model to reference similar cases during reasoning. Retrieving Relevant Cases: Use retrieval modules to find cases similar to the current question from a database and provide them to the model. c. Model Fine-Tuning Domain-Specific Fine-Tuning: Fine-tune the model using extensive domain data to internalize domain knowledge. Reinforcement Learning: Employ reward mechanisms to encourage the model to produce desired reasoning processes and answers. Engineering Practice Example Example: A legal assistant AI handling complex cases. Scenario: A user consults on a complex legal issue. The AI needs to provide advice, citing relevant legal provisions and precedents. Technical Implementation: Data Preparation: Collect a large corpus of legal documents, case analyses, expert opinions, etc. Model Fine-Tuning: Fine-tune the LLM using legal domain data to equip it with legal reasoning capabilities. Case Retrieval: Use RAG to retrieve relevant precedents and legal provisions from a database. Answer Generation: Input the retrieved cases and provisions as context into the fine-tuned LLM to generate professional legal advice. Optimization Strategies: Continuous Learning: Regularly update the model by adding new legal cases and regulatory changes. Expert Review: Incorporate legal experts to review the model's outputs, ensuring accuracy and legality. Comprehensive Consideration: Combining Fine-Tuned LLMs and RAG While fine-tuning LLMs can enhance the model's reasoning ability and domain adaptability, it cannot entirely replace the role of RAG. RAG has unique advantages in handling dynamic, massive, and real-time updated knowledge. Combining fine-tuning and RAG leverages their respective strengths, enabling the model to possess strong reasoning capabilities while accessing the latest and most comprehensive external knowledge. Advantages of the Combination Enhanced Reasoning Ability: Through fine-tuning, the model learns domain-specific reasoning methods and logic. Real-Time Knowledge Access: RAG allows the model to retrieve the latest external data in real-time when generating answers. Flexibility and Scalability: RAG systems can easily update data sources without the need to retrain the model. Practical Application Suggestions Combining Fine-Tuning and RAG for Complex Tasks: Use fine-tuning to enhance the model's reasoning and logic capabilities, while employing RAG to obtain specific knowledge and information. Evaluating Cost-Benefit Ratio: Consider the costs and benefits of fine-tuning; focus on fine-tuning core reasoning abilities and let RAG handle knowledge acquisition. Continuous Update and Maintenance: Establish data update mechanisms for the RAG system to ensure the external data accessed by the model is up-to-date and accurate. RAG Detailed technical explaination Retrieval Augmented Generation (RAG) is a technique that combines large language models (LLMs) with information retrieval. It enhances the model's capabilities by retrieving and utilizing relevant information from external knowledge bases during the generation process. This provides the model with up-to-date, domain-specific knowledge, enabling it to generate more accurate and contextually relevant responses. Purpose of RAG Why do we need RAG? Reducing Hallucinations: LLMs may produce inaccurate or false information, known as "hallucinations," when they lack sufficient context. RAG reduces the occurrence of hallucinations by providing real-time external information. Updating Knowledge: The pre-training data of LLMs may lag behind current information. RAG allows models to access the latest data sources, maintaining the timeliness of information. Enhancing Accuracy: By retrieving relevant background information, the model's answers become more accurate and professional. How RAG Works The core idea of RAG is to retrieve relevant information from a document repository and input it into the LLM along with the user's query, guiding the model to generate a more accurate answer. The general process is as follows: User Query: The user poses a question or request to the system. Retrieval Phase: The system uses the query to retrieve relevant document fragments (chunks) from a document repository or knowledge base. Generation Phase: The retrieved document fragments are input into the LLM along with the original query to generate the final answer. Key Steps to Building a RAG System Clarify Objectives Before starting to build a RAG system, you need to first clarify your goals: Upgrade Search Interface: Do you want to add semantic search capabilities to your existing search interface? Enhance Domain Knowledge: Do you wish to utilize domain-specific knowledge to enhance search or chat functions? Add a Chatbot: Do you want to add a chatbot to interact with customers? Expose Internal APIs: Do you plan to expose internal APIs through user dialogues? Clear objectives will guide the entire implementation process and help you choose the most suitable technologies and strategies. Data Preparation Data is the foundation of a RAG system, and its quality directly affects system performance. Data preparation includes the following steps: (1) Assess Data Formats Structured Data: Such as CSV, JSON, etc., which need to be converted into text format to facilitate indexing and retrieval. Tabular Data: May need to be converted or enriched to support more complex searches or interactions. Text Data: Such as documents, articles, chat records, etc., which may need to be organized or filtered. Image Data: Including flowcharts, documents, photographs, and similar images. (2) Data Enrichment Add Contextual Information: Supplement data with additional textual content, such as knowledge bases or industry information. Data Annotation: Label key entities, concepts, and relationships to enhance the model's understanding capabilities. (3) Choose the Right Platform Vector Databases: Such as AI Search, Qdrant, etc., used for storing and retrieving embedding vectors. Relational Databases: The database schema needs to be included in the LLM's prompts to translate user requests into SQL queries. Text Search Engines: Like AI Search, Elasticsearch, Couchbase, which can be combined with vector search to leverage both text and semantic search advantages. Graph Databases: Build knowledge graphs to utilize the connections and semantic relationships between nodes. Document Chunking In a RAG system, document chunking is a critical step that directly affects the quality and relevance of the retrieved information. Below are the best practices for chunking: Model Limitations: LLMs have a maximum context length limitation. Improve Retrieval Efficiency: Splitting large documents into smaller chunks helps to improve retrieval accuracy and speed. Methods to do chunking Fixed-Size Chunking: Define a fixed size (e.g., 200 words) for chunks and allow a certain degree of overlap (e.g., 10-15%). Content-Based Variable-Size Chunking: Chunk based on content features (such as sentences, paragraphs, Markdown structures). Custom or Iterative Chunking Strategies: Combine fixed-size and variable-size methods and adjust according to specific needs. Importance of Content Overlap Preserve Context: Allowing some overlap between chunks during chunking helps to retain contextual information. Recommendation: Start with about 10% overlap and adjust based on specific data types and use cases. Choosing the Right Embedding Model Embedding models are used to convert text into vector form to facilitate similarity computation. When choosing an embedding model, consider: Model Input Limitations: Ensure the input text length is within the model's allowable range. Model Performance and Effectiveness: Choose a model with good performance and suitable effectiveness based on the specific application scenario. New Embedding Models: OpenAI has introduced two new embedding models:text-embedding-3-smallandtext-embedding-3-large. Model Size and Performance:text-embedding-3-largeis a larger and more powerful embedding model capable of creating embeddings with up to 3,072 dimensions. Performance Improvements: MIRACL Benchmark:text-embedding-3-largescored54.9on the MIRACL benchmark, showing a significant improvement overtext-embedding-ada-002which scored31.4. MTEB Benchmark:text-embedding-3-largescored64.6on the MTEB benchmark, surpassingtext-embedding-ada-002which scored61.0. Analysis of Improvements: Higher Dimensions: The ability oftext-embedding-3-largeto create embeddings with up to 3,072 dimensions allows it to better capture and represent the concepts and relationships within the content. Improved Training Techniques: The new model employs more advanced training techniques and optimization methods, resulting in better performance on multilingual retrieval and English tasks. Flexibility:text-embedding-3-largeallows developers to balance performance and cost by adjusting the dimensionality of the embeddings. For example, reducing the 3,072-dimensional embeddings to 256 dimensions can still outperform the uncompressedtext-embedding-ada-002on the MTEB benchmark. Note: To migrate fromtext-embedding-ada-002totext-embedding-3-large, you'll need to manually generate new embeddings, as upgrading between embedding models is not automatic. The first step is to deploy the new model (text-embedding-3-large) within your Azure environment. After that, re-generate embeddings for all your data, as embeddings from the previous model will not be compatible with the new one. AI Search Service Capacity and Performance Optimization Service Tiers and Capacity Refer to:https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity Upgrade Service Tier: Upgrading from Standard S1 to S2 can provide higher performance and storage capacity. Increase Partitions and Replicas: Adjust based on query load and index size. Avoid Complex Queries: Reduce the use of high-overhead queries, such as regular expression queries. Query Optimization: Retrieve only the required fields, limit the amount of data returned, and use search functions rather than complex filters.Tips for Improving Azure AI Search Performance Tips for Improving Azure AI Search Performance Index Size and Architecture: Regularly optimize the index; remove unnecessary fields and documents. Query Design: Optimize query statements to reduce unnecessary scanning and computation. Service Capacity: Adjust replicas and partitions appropriately based on query load and index size. Avoid Complex Queries: Reduce the use of high-overhead queries, such as regular expression queries. Chunking Large Documents Use Built-in Text Splitting Skills: Choose modes likepagesorsentencesbased on needs. Adjust Parameters: Set appropriatemaximumPageLength,pageOverlapLength, etc., based on document characteristics. Use Tools Like LangChain: For more flexible chunking and embedding operations. L1+L2 Search + Query Rewriting and New Semantic Reranker L1 Hybrid Search+L2 Re-ranker:Enhance search result Query Rewriting: Improve recall rate and accuracy by rewriting user queries. Semantic Reranker: Use cross-encoders to re-rank candidate results, enhancing result relevance. Prompt Engineering Use Rich Examples: Provide multiple examples to guide the model's learning and improve its responses. Provide Clear Instructions: Ensure that instructions are explicit and unambiguous to avoid misunderstandings. Restrict Input and Output Formats: Define acceptable input and output formats to prevent malicious content and protect model security. Note:Prompt Engineering is not suitable for Azure OpenAI o1 Reference:https://mp.weixin.qq.com/s/tLcAfPU6hUkFsNMjDFeklw?token=1531586958&lang=zh_CN Your task is to review customer questions and categorize them into one of the following 4 types of problems. The review steps are as follows, please perform step by step: 1. Extract three keywords from customer questions and translate them into English. Please connect the three keywords with commas to make it a complete JSON value. 2. Summarize the customer’s questions in 15 more words and in English. 3. Categorize the customer’s questions based on Review text and summary. Category list: • Technical issue: customer is experiencing server-side issues, client errors or product limitations. Example: "I'm having trouble logging into my account. It keeps saying there's a server error." • Product inquiry: customer would like to know more details about the product or is asking questions about how to use it. Example: "Can you provide more information about the product and how to use it?" • Application status: customer is requesting to check the status of their Azure OpenAI, GPT-4 or DALLE application. Example: "What is the status of my Azure OpenAI application?" • Unknown: if you only have a low confidence score to categorize. Example: "I'm not sure if this is the right place to ask, but I have a question about billing." Provide them in JSON format with the following keys: Case id; Key-words; Summary; Category. Please generate only one JSON structure per review. Please show the output results - in a table, the table is divided into four columns: Case ID, keywords, Summary, Category. Demo: Lenovo ThinkPad Product RAG I have a Lenovo ThinkPad product manual, and I want to build a RAG (Retrieval-Augmented Generation) system based on it. The document includes up to dozens of product models, many of which have very similar names. Moreover, the document is nearly 900 pages long. Therefore, to construct a RAG system based on this document and provide precise answers, I need to address the following issues: How to split the document; How to avoid loss of relevance; How to resolve information discontinuity; The problem of low search accuracy due to numerous similar products; The challenge of diverse questioning on the system's generalization ability. In the end, I chunked the document based on the product models and set more effective prompts, so that the RAG (Retrieval-Augmented Generation) system can accurately answer questions. System prompt You are an AI assistant that helps people answer the question of Lenovo product. Please be patient and try to answer your questions in as much detail as possible and give the reason. When you use reference documents, please list the file names directly, not only Citation 1, should be ThinkPad E14 Gen 4 (AMD).pdf.txt eg. Index format is as following: Final Result: Please see my demo vedios on Yutube Refer to: https://arxiv.org/pdf/2409.14924v1415Views0likes0CommentsVoiceRAG: An App Pattern for RAG + Voice Using Azure AI Search and the GPT-4o Realtime API for Audio
In this blog post we present a simple architecture for voice-based generative AI applications that implements the RAG pattern by combining the newgpt-4o-realtime-preview model with Azure AI Search.38KViews8likes14CommentsAzure AI Search October Updates: Nearly 100x Compression with Minimal Quality Loss
In our continued effort to equip developers and organizations with advanced search tools, we are thrilled to announce the launch of several new features in thelatest Preview API for Azure AI Search. These enhancements are designed to optimize vector index size and provide more granular control and understanding of your search index to build Retrieval-Augmented Generation (RAG) applications. MRL Support for Quantization Matryoshka Representation Learning (MRL) is a new technique that introduces a different form of vector compression, which complements and works independently of existing quantization methods. MRL enables the flexibility to truncate embeddings without significant semantic loss, offering a balance between vector size and information retention. This technique works by training embedding models so that information density increases towards the beginning of the vector. As a result, even when using only a prefix of the original vector, much of the key information is preserved, allowing for shorter vector representations without a substantial drop in performance. OpenAI has integrated MRL into their 'text-embedding-3-small' and 'text-embedding-3-large' models, making them adaptable for use in scenarios where compressed embeddings are needed while maintaining high retrieval accuracy. You can read more about the underlying research in the official paper [1] or learn about the latest OpenAI embedding models in their blog. Storage Compression Comparison Table 1.1 below highlights the different configurations for vector compression, comparing standard uncompressed vectors, Scalar Quantization (SQ), and Binary Quantization (BQ) with and without MRL. The compression ratio demonstrates how efficiently the vector index size can be optimized, yielding significant cost savings. You can find more about our Vector Index Size Limits here: Service limits for tiers and skus - Azure AI Search | Microsoft Learn. Table 1.1: Vector Index Size Compression Comparison Configuration *Compression Ratio Uncompressed - SQ 4x BQ 28x **MRL + SQ (1/2 and 1/3 truncation dimension respectively) 8x-12x **MRL + BQ (1/2 and 1/3 truncation dimension respectively) 64x – 96x Note: Compression ratios depend on embedding dimensions and truncation. For instance, using “text-embedding-3-large” with 3072 dimensions truncated to 1024 dimensions can result in 96x compression with Binary Quantization. *All compression methods listed above, may experience slightly lower compression ratios due to overhead introduced by the index data structures. See "Memory overhead from selected algorithm" for more details. **The compression impact when using MRL depends on the value of the truncation dimension. We recommend either using ½ or 1/3 of the original dimensions to preserve embedding quality (see below) Quality Retainment Table: Table 1.2 provides a detailed view of the quality retainment when using MRL with quantization across different models and configurations. The results indicate the impact on Mean NDCG@10 across a subset of MTEB datasets, showing that high levels of compression can still preserve up to 99% of search quality, particularly with BQ and MRL. Table 1.2: Impact of MRL on Mean NDCG@10 Across MTEB Subset Model Name Original Dimension MRL Dimension Quantization Algorithm No Rerank (% Δ) Rerank 2x Oversampling (% Δ) OpenAI text-embedding-3-small 1536 512 SQ -2.00% (Δ = 1.155) -0.0004% (Δ = 0.0002) OpenAI text-embedding-3-small 1536 512 BQ -15.00% (Δ = 7.5092) -0.11% (Δ = 0.0554) OpenAI text-embedding-3-small 1536 768 SQ -2.00% (Δ = 0.8128) -1.60% (Δ = 0.8128) OpenAI text-embedding-3-small 1536 768 BQ -10.00% (Δ = 5.0104) -0.01% (Δ = 0.0044) OpenAI text-embedding-3-large 3072 1024 SQ -1.00% (Δ = 0.616) -0.02% (Δ = 0.0118) OpenAI text-embedding-3-large 3072 1024 BQ -7.00% (Δ = 3.9478) -0.58% (Δ = 0.3184) OpenAI text-embedding-3-large 3072 1536 SQ -1.00% (Δ = 0.3184) -0.08% (Δ = 0.0426) OpenAI text-embedding-3-large 3072 1536 BQ -5.00% (Δ = 2.8062) -0.06% (Δ = 0.0356) Table 1.2 compares the relative point differences of Mean NDCG@10 when using different MRL dimensions (1/3 and 1/2 from the original dimensions) from an uncompressed index across OpenAI text-embedding models. Key Takeaways: 99% Search Quality with BQ + MRL + Oversampling: Combining Binary Quantization (BQ) with Oversampling and Matryoshka Representation Learning (MRL) retains 99% of the original search quality in the datasets and embeddings combinations we tested, even with up to 96x compression, making it ideal for reducing storage while maintaining high retrieval performance. Flexible Embedding Truncation: MRL enables dynamic embedding truncation with minimal accuracy loss, providing a balance between storage efficiency and search quality. No Latency Impact Observed: Our experiments also indicated that using MRL had no noticeable latency impact, supporting efficient performance even at high compression rates. For more details on how MRL works and how to implement it, visit the MRL documentation. Targeted Vector Filtering Targeted Vector Filtering allows you to apply filters specifically to the vector component of hybrid search queries. This fine-grained control ensures that your filters enhance the relevance of vector search results without inadvertently affecting keyword-based searches. Sub-Scores Sub-Scores provide granular scoring information for each recall set contributing to the final search results. In hybrid search scenarios, where multiple factors like vector similarity and text relevance play a role, Sub-Scores offer transparency into how each component influences the overall ranking. Text Split Skill by Tokens The Text Split Skill by Tokens feature enhances your ability to process and manage large text data by splitting text based on token countsThis gives you more precise control over passage (chunk) length, leading to more targeted indexing and retrieval, particularly for documents with extensive content. For any questions or to share your feedback, feel free to reach out through our Azure Search · Community Getting started with Azure AI Search Learn more about Azure AI Search and about all thelatest features. Want to chat with your data? Check out VoiceRAG! Start creating a search service in the Azure Portal, Azure CLI, the Management REST API, ARM template, or a Bicep file. Learn about Retrieval Augmented Generation in Azure AI Search. Explore our preview client libraries in Python, .NET, Java, and JavaScript, offering diverse integration methods to cater to varying user needs. Explore how to create end-to-end RAG applications with Azure AI Studio. References: [1] Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., Howard-Snyder, W., Chen, K., Kakade, S., Jain, P., & Farhadi, A. (2024). Matryoshka Representation Learning. arXiv preprint arXiv:2205.13147. Retrieved fromhttps://arxiv.org/abs/2205.13147{2205.13147}4.4KViews1like0Comments