Mike Kuniavsky’s Post

View profile for Mike Kuniavsky, graphic

I build high-performing, diverse R&D teams at the intersection of AI, IoT, and design.

If you're wondering what I've been doing for most of this year, it's this. I worked on the team that just produced the first collaboratively- and openly-developed tool for evaluating the safety of interactive AI systems. More than just policies, guidelines, or good intentions, our benchmark is running code. It's a cloud service. It's Open Source. It evaluates models across 12 hazard categories, including violence and hate, and does so in a standard clearly documented approach that lets us compare apples to apples, regardless of underlying architecture. It's hard to game. It has a best in class ensemble model for identifying hazards. More importantly, MLCommons has pioneered the process by which organizations can develop such tools developed openly, yet securely, and incorporate the expertise of leading researchers, and the needs of civil society and policy makers. I'm very proud to have had a small part in the creation of this exciting, challenging, complex, and ultimately rewarding effort. I plan to continue to work with MLCommons going forward and I'm excited about the plans for 2025. #artificialintelligence #AI #generativeai #llm

View organization page for MLCommons, graphic

5,476 followers

Today, we are thrilled to announce the release of AILuminate, a first-of-its kind safety test for large language models (LLMs). AILuminate is the first AI safety benchmark to have broad support from both industry and academia – and its launch represents a major milestone in progress towards a global standard on AI safety. The AILuminate benchmark was developed by the MLCommons AI Risk & Reliability working group — a team of leading AI researchers from institutions including Stanford University, Columbia University, and Eindhoven University of Technology, civil society representatives, and technical experts from Google, Intel Corporation, NVIDIA, Meta, Microsoft, Qualcomm Technologies, Inc., amongst others committed to a standardized approach to AI safety. The AILuninate benchmark delivers a comprehensive set of safety grades for today's most prevalent LLMs. Building on MLCommons’ track record of producing trusted AI performance benchmarks, the AILuminate benchmark offers a scientific, independent analysis of LLM risk that can be immediately incorporated into company decision-making. Learn more and explore the benchmark. https://lnkd.in/gBu9wm2B #AILuminate #AI #AIsafetybenchmark #LLMs

  • No alternative text description for this image
Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1w

It's fascinating how your benchmark moves beyond theoretical safety discussions into practical, executable evaluations. The 12 hazard categories you've defined seem comprehensive, but have you considered incorporating a category for emergent behaviors that might not be explicitly programmed? Given the potential for unforeseen consequences in complex AI systems, wouldn't it be valuable to assess models for their susceptibility to unintended emergent behavior, perhaps through techniques like adversarial testing and reinforcement learning analysis?

Like
Reply
Adrian Chan

Bridging experience and interaction design with AI

1w

Interesting! Had a look - does hazard detection become much more difficult over multi-turn conversations? Or is it less a matter of difficulty and more a matter of cost?

Robert “Fixer” Smith

Emerging Tech & Production Innovation for Paramount Global

1w

This is so problematic. We're now thought police where our LLM's cannot even talk about popular movie plots.

Like
Reply

Mike, you’re consistently in the most interesting of spaces, this sounds amazing.

Chris Poore

Principal - Fractional Product Leader and Technical Product Strategist | Innovator, Catalyst and Advisor to B2B SaaS firms

1w

This has should have some ‘legs’. Glad to see you’re still breaking the mold. Blaine Brown may want to check this nut

Doing god’s work here Mike Kuniavsky thank you for applying your talents to it.

Judith Zissman

Executive Creative Director at Blue Telescope

1w

This looks great. Congrats!

Mary Lukanuski

Product & Design Executive | Head of Product Design @Intapp | ex- Zendesk/Google/Autodesk | UX/CX | AI Product Design | Leadership Mentor & Board Member

4d

wow.. this is so timely and needed. yes, lots of work to be done and glad you and your colleagues are making progress. very exciting

Peter Boersma

Design Org Designer: Design Operations, Design Management, Design Process, and Design Strategy.

6d

Rob van der Veer , are you aware of this?

Philip van Allen

Researcher, Consultant, Professor - Design and Artificial Intelligence

3d

Looks great Mike! AI Safety Benchmarks (vs. "guidelines") are very needed. Roy Bendor have you seen this?

See more comments

To view or add a comment, sign in

Explore topics