Mike Kuniavsky’s Post

I build high-performing, diverse R&D teams at the intersection of AI, IoT, and design.

1w Edited

If you're wondering what I've been doing for most of this year, it's this. I worked on the team that just produced the first collaboratively- and openly-developed tool for evaluating the safety of interactive AI systems. More than just policies, guidelines, or good intentions, our benchmark is running code. It's a cloud service. It's Open Source. It evaluates models across 12 hazard categories, including violence and hate, and does so in a standard clearly documented approach that lets us compare apples to apples, regardless of underlying architecture. It's hard to game. It has a best in class ensemble model for identifying hazards. More importantly, MLCommons has pioneered the process by which organizations can develop such tools developed openly, yet securely, and incorporate the expertise of leading researchers, and the needs of civil society and policy makers. I'm very proud to have had a small part in the creation of this exciting, challenging, complex, and ultimately rewarding effort. I plan to continue to work with MLCommons going forward and I'm excited about the plans for 2025. #artificialintelligence #AI #generativeai #llm

MLCommons

5,476 followers

1w Edited

Today, we are thrilled to announce the release of AILuminate, a first-of-its kind safety test for large language models (LLMs). AILuminate is the first AI safety benchmark to have broad support from both industry and academia – and its launch represents a major milestone in progress towards a global standard on AI safety. The AILuminate benchmark was developed by the MLCommons AI Risk & Reliability working group — a team of leading AI researchers from institutions including Stanford University, Columbia University, and Eindhoven University of Technology, civil society representatives, and technical experts from Google, Intel Corporation, NVIDIA, Meta, Microsoft, Qualcomm Technologies, Inc., amongst others committed to a standardized approach to AI safety. The AILuninate benchmark delivers a comprehensive set of safety grades for today's most prevalent LLMs. Building on MLCommons’ track record of producing trusted AI performance benchmarks, the AILuminate benchmark offers a scientific, independent analysis of LLM risk that can be immediately incorporated into company decision-making. Learn more and explore the benchmark. https://lnkd.in/gBu9wm2B #AILuminate #AI #AIsafetybenchmark #LLMs

19 Comments

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

It's fascinating how your benchmark moves beyond theoretical safety discussions into practical, executable evaluations. The 12 hazard categories you've defined seem comprehensive, but have you considered incorporating a category for emergent behaviors that might not be explicitly programmed? Given the potential for unforeseen consequences in complex AI systems, wouldn't it be valuable to assess models for their susceptibility to unintended emergent behavior, perhaps through techniques like adversarial testing and reinforcement learning analysis?

Adrian Chan

Bridging experience and interaction design with AI

Interesting! Had a look - does hazard detection become much more difficult over multi-turn conversations? Or is it less a matter of difficulty and more a matter of cost?

1 Reaction

Robert “Fixer” Smith

Emerging Tech & Production Innovation for Paramount Global

This is so problematic. We're now thought police where our LLM's cannot even talk about popular movie plots.

Joshua Bloom

UX Lead @ Google

Mike, you’re consistently in the most interesting of spaces, this sounds amazing.

1 Reaction

Chris Poore

Principal - Fractional Product Leader and Technical Product Strategist | Innovator, Catalyst and Advisor to B2B SaaS firms

This has should have some ‘legs’. Glad to see you’re still breaking the mold. Blaine Brown may want to check this nut

1 Reaction

Oliver Raskin

Doing god’s work here Mike Kuniavsky thank you for applying your talents to it.

1 Reaction

Judith Zissman

Executive Creative Director at Blue Telescope

This looks great. Congrats!

1 Reaction

Mary Lukanuski

wow.. this is so timely and needed. yes, lots of work to be done and glad you and your colleagues are making progress. very exciting

1 Reaction

Peter Boersma

Design Org Designer: Design Operations, Design Management, Design Process, and Design Strategy.

Rob van der Veer , are you aware of this?

2 Reactions

Philip van Allen

Researcher, Consultant, Professor - Design and Artificial Intelligence

Looks great Mike! AI Safety Benchmarks (vs. "guidelines") are very needed. Roy Bendor have you seen this?

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

MLCommons

5,476 followers
1w Edited
Report this post
Today, we are thrilled to announce the release of AILuminate, a first-of-its kind safety test for large language models (LLMs). AILuminate is the first AI safety benchmark to have broad support from both industry and academia – and its launch represents a major milestone in progress towards a global standard on AI safety. The AILuminate benchmark was developed by the MLCommons AI Risk & Reliability working group — a team of leading AI researchers from institutions including Stanford University, Columbia University, and Eindhoven University of Technology, civil society representatives, and technical experts from Google, Intel Corporation, NVIDIA, Meta, Microsoft, Qualcomm Technologies, Inc., amongst others committed to a standardized approach to AI safety. The AILuninate benchmark delivers a comprehensive set of safety grades for today's most prevalent LLMs. Building on MLCommons’ track record of producing trusted AI performance benchmarks, the AILuminate benchmark offers a scientific, independent analysis of LLM risk that can be immediately incorporated into company decision-making. Learn more and explore the benchmark. https://lnkd.in/gBu9wm2B #AILuminate #AI #AIsafetybenchmark #LLMs
6 Comments
Like Comment
To view or add a comment, sign in
Marisa Ferrara Boston

Sophisticated next word generator with a strong humanist reward function
1w
Report this post
One of the best aspects of Reins AI is that I get to work with people who are actively building in ambiguous and difficult areas of AI. My work with MLCommons has focused on product strategy for real-world AI reliability benchmarks. And I get to do it with incredibly diverse and strong experts like Mike Kuniavsky, James Ezick, Heather Frase, Ph.D., CAMS, and of course Rebecca Weiss and Peter Mattson. Here's the recent Wired article: https://lnkd.in/gdxac23h
MLCommons

5,476 followers
1w Edited

Today, we are thrilled to announce the release of AILuminate, a first-of-its kind safety test for large language models (LLMs). AILuminate is the first AI safety benchmark to have broad support from both industry and academia – and its launch represents a major milestone in progress towards a global standard on AI safety. The AILuminate benchmark was developed by the MLCommons AI Risk & Reliability working group — a team of leading AI researchers from institutions including Stanford University, Columbia University, and Eindhoven University of Technology, civil society representatives, and technical experts from Google, Intel Corporation, NVIDIA, Meta, Microsoft, Qualcomm Technologies, Inc., amongst others committed to a standardized approach to AI safety. The AILuninate benchmark delivers a comprehensive set of safety grades for today's most prevalent LLMs. Building on MLCommons’ track record of producing trusted AI performance benchmarks, the AILuminate benchmark offers a scientific, independent analysis of LLM risk that can be immediately incorporated into company decision-making. Learn more and explore the benchmark. https://lnkd.in/gBu9wm2B #AILuminate #AI #AIsafetybenchmark #LLMs
2 Comments
Like Comment
To view or add a comment, sign in
Inventorly
1mo
Report this post
Nvidia's Groundbreaking Move: Open-Source AI Takes Center Stage Nvidia has made a bold statement with the launch of its NVLM 1.0, a powerful open-source AI model that challenges the dominance of proprietary systems. With its 72 billion parameters, NVLM-D-72B not only competes with giants like GPT-4 but also offers a unique blend of vision and language processing capabilities. This open-access approach is set to democratize AI technology, potentially reshaping the landscape of research and development. Could this be the beginning of a new era where innovation knows no bounds? Explore the full article to understand the deeper implications of these advancements. https://lnkd.in/geGWjYYH #OpenSourceAI #Innovation #AIResearch #Nvidia #TechAdvancements Want to automate marketing and boost your company's sales with artificial intelligence? Schedule a consultation with us today.
Like Comment
To view or add a comment, sign in
Lena Business Solutions

2,577 followers
9mo
Report this post
Lena Biweekly Update on AI! 🦾 What happened with AI this week? A lot 🎢 🧠 Quantum Computing Gets a Boost Researchers from Quantinuum and Google DeepMind introduced AlphaTensor-Quantum, an AI algorithm designed to optimize quantum circuits. This innovation could significantly accelerate quantum computing by reducing T gate counts, marking a leap forward in circuit optimization. ⚖ Justice AI Initiative Launched The U.S. Department of Justice unveiled "Justice AI," a collaborative project to explore how AI can enhance enforcement efforts. Deputy Attorney General Lisa Monaco announced the initiative, which aims to harness AI's potential while mitigating risks. 📸 OpenAI's Sora Wows Social Media OpenAI introduced Sora, a video generation model that's causing a stir on social platforms. With its ability to create realistic videos from text or images, Sora showcases the power of AI in visual storytelling, sparking excitement among users and tech enthusiasts alike. Stay tuned with Lena for more groundbreaking updates in the world of AI! #AI #Lena #TechNews
Like Comment
To view or add a comment, sign in
Rich Washburn

AI Strategist & Developer | Digital Forensics (EnCE, CEH) | IT Consultant | Digital Marketing & Web Innovation | Tap2Space Founder | Professional Photographer | Inventor
8mo
Report this post
Democratizing Ai: The First Fully Open-Source AI Agent Device to Transform our Interaction with Technology. #AI #OpenSource #Innovation #TechRevolution #01Lite #FutureOfComputing #DigitalTransformation #AIRevolution #TechNews #SmartTechnology https://wix.to/CTq7WZZ #newblogpost

Open Interpreter's 01 Lite - The First Fully Open-Source AI Agent Device

richwashburn.com
Like Comment
To view or add a comment, sign in
The AI Newsroom

20,764 followers
7mo
Report this post
The next step in AI evolution has arrived. Rabbit AI's R1, powered by the Large Action Model (LAM), redefines AI interactions by seamlessly executing tasks and mimicking human actions with applications. This hybrid approach of symbolic programming and neural networks allows LAMs to handle complex activities, marking a significant advancement in AI-powered interactions. LAMs go beyond traditional language models, expanding capabilities to perform physical actions, control devices, and manage tasks. This revolutionary technology is already in use with the Rabbit R1, an AI-powered device that simplifies tasks like ordering groceries, booking transportation, and sending messages. But with great power, comes great responsibility. As LAMs continue to develop, challenges such as data privacy concerns, ethical issues, and scalability must be addressed. How can we ensure the responsible and effective use of this technology? Hi, 👋🏼 my name is Doug, I love AI, and I post content to keep you up to date with the latest AI news. Follow and ♻️ repost to share the information! #lam #aiinteractions #futureofai
Like Comment
To view or add a comment, sign in
Plainsight Technologies

12,888 followers
1mo
Report this post
🏃🏽The race to adopt AI is on - but how do we ensure it's ethical? From intellectual property challenges to GPU shortages and chip-level innovations, the path forward isn't just about speed—it's about making the right moves to avoid risks. Watch Larry Carvalho Principal Consultant at RobustCloud, and Kit Merker at Plainsight Technologies explore the future of AI, computer vision, and AI agents: https://hubs.ly/Q02VLMMH0 #AI #AIAgents #EthicalAI #AIEthics
Like Comment
To view or add a comment, sign in
NexGen Cloud

8,993 followers
5mo Edited
Report this post
NVIDIA Unveils Nemotron-4 for Synthetic Data Generation! 🐠 NVIDIA have released Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs). This innovation promises to enhance AI training across multiple industries by providing high-quality, scalable synthetic datasets. To boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses, currently first place on the Hugging Face RewardBench leaderboard by Allen Institute for AI (AI2). 🔗 Read the full article: https://lnkd.in/erp-t93G #Nemotron4 #AI #LLM #NVIDIA #SyntheticData #TechInnovation #GenerativeAI #HuggingFace #OpenSource
Like Comment
To view or add a comment, sign in
The Tech Buzz

1,662 followers
9mo
Report this post
Groq is transforming the AI landscape with its AI model’s lightning-fast response speed. Powered by a cutting-edge custom ASIC chip, Groq's proprietary Language Processing Unit (LPU) generates a staggering 500 tokens per second. To put this into perspective, ChatGPT 3.5 lags behind at 49 tokens per second. Groq is disrupting the conventional norms in AI by adopting an open-source approach, and it's taking on NVIDIA's dominance with its cost-effective alternative to the expensive and often scarce GPUs traditionally used for running AI models. ______________________________ Get up to speed on the latest developments in tech with our newsletter: https://lnkd.in/d6tfcfvF #techbuzz #techbuzzventures #groq #ai #aimodels #generativeai #generativeaitools #artificialintelligence #machinelearning #machinelearningmodels #opensource #opensourceai #opensourcecommunity Groq

3 Comments
Like Comment
To view or add a comment, sign in
utsav soi

building shoppin' • hiring rockstar engineers & designers • ex-international athlete
9mo
Report this post
Have a look at the world’s fastest AI Chatbot 🔥�� The key innovation here is the Language Processing Unit™ (LPU) which massively simplifies chip design. Groq, founded by the inventor of Google's TPU, has developed groundbreaking AI hardware that delivers blazing-fast responses. This architectural breakthrough enables industry-leading throughput for large language models, with sub-millisecond latency. ↓ Follow Utsav Soi to learn about the latest breakthroughs in AI and tech 🚀 #artificialintelligence #LIPostingChallengeIndia

1 Comment
Like Comment
To view or add a comment, sign in

2,445 followers

218 Posts

View Profile Follow

Mike Kuniavsky’s Post

More Relevant Posts

Explore topics