Dr. Mark T. Maybury’s Post

6mo

Grounding LLMs in facts is good not only to counter hallucination but also to reimburse fact creators for their intellectual effort and property.

Google Cloud CEO Thomas Kurian says new AI models are more grounded in fact

axios.com

To view or add a comment, sign in

More Relevant Posts

Pietro Sulis

Senior Software Analyst at Siemens
7mo
Report this post
In the LLM world size matters but very good results have been achieved with a proper use of a SML such as Phi-2 to create "expert adapters" as fine tuned models. Based on the user query, the most suitable adapter is selected. A smart and very energy-efficient approach. https://lnkd.in/d_Meru3U

Getting Modular with Language Models: Building, Reusing a Library of Experts for Task Generalization - Microsoft Research

https://www.microsoft.com/en-us/research
Like Comment
To view or add a comment, sign in
Peter van der Putten

Using trustworthy AI to create impact in business, society, arts & science | Director Pega AI Lab | Assistant professor Artificial X,Leiden University
10mo Edited
Report this post
Game on for LLM choice. I expect the market for base LLMs to commodify further in the near future, though we will also continue to see sparks of non-incremental change. Tx again to Jasper Schelling for the tip.

Introducing Mistral-Large on Azure in partnership with Mistral AI | Microsoft Azure Blog

https://azure.microsoft.com/en-us/blog
Like Comment
To view or add a comment, sign in
Satveer Khurpa

Senior GenAI Specialist Solutions Architect, Amazon Bedrock GTM
1w
Report this post
Our next keynote covers The State of LLM Agents.

2024 in Agents [LS Live! @ NeurIPS 2024]

podcasts.apple.com
Like Comment
To view or add a comment, sign in
SmythOS

2,154 followers
2mo
Report this post
📘 Configuring and Using the LLM Prompt Component in SmythOS: A Complete Guide The LLM Prompt component in SmythOS is revolutionizing how we generate content through AI. Here's a comprehensive breakdown of its capabilities: 🔧 Model Configuration • Default Models: Full OpenAI suite (GPT 3.5, GPT 4) • Custom Models: Seamless integration with Together AI and Claude AI • API Integration: Bring your own keys for maximum flexibility ⚙️ Prompt Settings & Controls • Dynamic prompt configuration with input variables • Temperature control (default: 1) • Top P settings for response breadth • Maximum output tokens customization • Stop sequence definition • Frequency and presence penalties for reduced repetition 🚀 Advanced Customization Options Create custom models with: • Amazon's Bedrock • Google's Vertex AI • Full machine learning feature customization • Credential management options 💡 Practical Implementation Example: Generating Personalized Emails: 1. Configure name and email inputs 2. Set up detailed Sales department prompts 3. Utilize debug mode for JSON output review 4. Implement expressions for content sectioning 🔗 Essential Resources: Documentation: https://lnkd.in/eu2SvfNH Training: https://lnkd.in/eCehmk4K Community Support: Join our Discord at discord.gg/smythos For developers seeking robust language modeling integration, the LLM Prompt component offers unparalleled configurability and extensive customization support.

SmythOS - LLM Prompt Component

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Mantis - AI-native platform engineering

142 followers
1mo
Report this post
Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines on K8s 🎯 Key Innovations: - Advanced management of large language models (LLMs) lifecycle on Kubernetes. - Use of inference servers for seamless deployment and auto-scaling of models. - Integration of retrieval-augmented generation (RAG) with embeddings and vector databases. 💡 Notable Features: - Customized inference pipelines utilizing NVIDIA's Nim operator and KServe. - Efficient scheduling techniques for GPU resources with dynamic resource allocation. - Enhanced security through role-based access control (RBAC) and monitoring capabilities. 🛠️ Perfect for: - AI/ML Engineers deploying models in production. - Data Scientists involved in fine-tuning and inference tasks. - DevOps teams managing cloud-native applications on Kubernetes. ⚡️ Impact: - Reduced inference latency via effective model caching techniques. - Improved GPU utilization optimizing resource allocation and scheduling. - Increased security and manageability of AI pipelines in enterprise settings. 🔍 Preview of the Talk: In this insightful session, Meenakshi Kaushik and Shiva Krishna Merla from NVIDIA share comprehensive best practices for deploying and managing LLM inference pipelines on Kubernetes. They delve into critical challenges such as minimizing inference latency, optimizing GPU usage, and enhancing security measures. Attendees gain actionable insights on building customizable pipelines and leveraging NVIDIA’s technology stack to ensure efficient model management, ultimately leading to significant performance improvements. For more details, check out the full session here: https://lnkd.in/gRK7zPTM

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Jordan Bentley

AI Consultant | Ex-Salesforce Principal | Kaggle Competition Gold | Data Scientist | Machine Learning Expert | AI | Stochastic Parrot
9mo Edited
Report this post
What would you do if LLM generation became arbitrarily cheap and fast? Groq has already made a huge leap in LLM inference speed, but a lot of other research on improving LLM speeds has come out recently leaving me to wonder how fast things could possibly get? It will take a while before we see all of these advancements make their way into a SOTA model, with the exception of Groq they are architectural changes to the model that need to be made before training starts. But just for fun, let's see how fast we would get if their benefits were multiplicative. Groq does 300 tokens per second per user on Llama-2 70B, we can use that as a baseline. The 1-bit LLM paper claims 8.9x faster throughput on 2x A100s. Jamba claims 3x throughput on longer contexts compared to similar sized models. It isn't clear on whether this will work with Groq or the 1-bit paper, but just for fun I'm going to count it. There are other improvements out there as well, but already we are at 8,010 tokens per second. That's an average length full novel (~100k tokens) in 12.5 seconds. So again, what would you do if LLM generation became arbitrarily cheap and fast? Let me know in the comments. https://www.ai21.com/jamba https://lnkd.in/eerJiDQU https://lnkd.in/e53d2XHD

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

arxiv.org
Like Comment
To view or add a comment, sign in
Salman Paracha

Building Arch | Intelligent Infrastructure for GenAI | ex-AWS/Oracle
5mo
Report this post
👩⚕️ Do users care for LLM benchmarks? if you ever find yourself struggling to articulate 🤔 a central point - its probably because you haven't formed enough conviction around that point (yet). This uncomfortable feeling is a powerful driver to write, to clarify, to mold and shape the central point. Spending time here is high-quality #leveraged work. but once you emerge with a clear winning 🏆 thought - it pulls you, and with it others around you. The Katanemo team has been laser focused on human-centric LLM research and development, and I was struggling to capture the essence of our next experiment - but tonight's 2-page memo has offered much clarity. The scaling laws of LLMs (attention-based models at least) state that the larger the model, the better the quality. But at what point (and for what task) do user see or care for the 2% improvement in benchmark performance that these vendors are chasing. I am almost done designing the experiment and will be running some preliminary tests with a small group of people. If you are interested and want to participate in this internet-scale study - to bring a focus back on human-centric LLM research and to reduce the waste (now measured in Billions of GPUs) shoot me a message or comment below. I'll give you an early access pass and would love for early feedback as we harden out the experiment a bit more before publicly making it available!

1 Comment
Like Comment
To view or add a comment, sign in
Ario Ku 古偉駒 AKA Vai Kui Ku

ambassador-at-large fpyf usa/cdo/ad-hoc international goodwill and peace ambassadors advisory board member/cert mediator/registered hypnotherapist/former ceo
2mo
Report this post
fyi pls do ur own research as well

Insiders Selling Microsoft (MSFT) Amid Rising Cloud Competition and Costs

finance.yahoo.com
Like Comment
To view or add a comment, sign in
Alain AIROM

Partner Technical Specialist - Build
5mo
Report this post
https://lnkd.in/eUXuRMDQ 5 Key Points to Unlock LLM Quantization

5 Key Points to Unlock LLM Quantization

medium.com
Like Comment
To view or add a comment, sign in
Crowdbotics

7,608 followers
5mo
Report this post
Dive into the "Context Window Paradox" 🔍 Every foundational LLM company – OpenAI, Google, Anthropic, and more – proudly touts the “size of their context.” But what does that really mean? 🤔 In our latest video, we unravel the mystery behind context windows and uncover a surprising twist: as context windows increase, performance can actually decline. 😮 Curious to know why bigger isn’t always better? Join us for a quick yet enlightening exploration into the world of LLMs and the paradox that challenges the norm. 🎥 Watch now and unlock the secrets: https://lnkd.in/gC7vGmAv

LLM Context Window Paradox

https://www.youtube.com/
Like Comment
To view or add a comment, sign in

9,022 followers

View Profile Connect

Dr. Mark T. Maybury’s Post

Google Cloud CEO Thomas Kurian says new AI models are more grounded in fact

axios.com

More from this author

Mental Wellness Tips for Quarantine

Explore topics

Dr. Mark T. Maybury’s Post

More Relevant Posts

SmythOS - LLM Prompt Component

https://www.youtube.com/

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla

https://www.youtube.com/

LLM Context Window Paradox

https://www.youtube.com/

More from this author

Mental Wellness Tips for Quarantine

Explore topics