Thorn’s Safety by Design for Generative AI: Progress Reports

December 11, 2024

4 Minute Read

Safety By Design: Industry Commitments

As part of Thorn and All Tech Is Human’s Safety By Design initiative, some of the world’s leading AI companies have made a significant commitment to protect children from the misuse of generative AI technologies.

The organizations—including Amazon, Anthropic, Civitai, Google, Invoke, Meta, Metaphysic, Microsoft, Mistral AI, OpenAI and Stability AI—have all pledged to adopt the campaign principles, which aim to prevent the creation and spread of AI-generated child sexual abuse material (AIG-CSAM) and other sexual harms against children.

As part of their commitments, these companies will continue to transparently publish and share documentation of their progress in implementing these principles.

This is a critical component of our overall three-pillar strategy for accountability:

Publishing progress reports with insights from the committed companies (to support public awareness and pressure where necessary)
Collaborating with standard setting institutions to scale the reach of these principles and mitigations (opening the door for third party auditing)
Engaging with policymakers such that they understand what is technically feasible and impactful in this space, to inform necessary legislation.

Three-Month Progress Reports

Some participating companies have committed to reporting their progress on a three-month cadence (Civitai, Invoke, and Metaphysic), while others will report annually. Below are the latest updates from the companies reporting quarterly. You can also download the latest three-month progress report in full here.

October 2024: Civitai

Civitai reports no additional progress since their July 2024 report, citing other work priorities. Their metrics show continued moderation efforts:

Detected over 120,000 violative prompts, with 100,000 indicating attempts to create AIG-CSAM
Prevented over 400 attempts to upload models optimized for AIG-CSAM
Removed approximately 5-10 problematic models per month
Detected and reported 2 instances of CSAM and over 100 instances of AIG-CSAM to NCMEC Areas requiring continued progress remain the same as July’s report.

Areas requiring progress remain consistent with July’s report, including the need to retroactively assess third-party models currently hosted on their platform.

October 2024: Metaphysic

Metaphysic reports no additional progress since their July 2024 report, citing other work priorities related to being in the middle of a funding process. Their metrics show continued maintenance of their existing safeguards:

100% of datasets audited and updated
No CSAM detected in their datasets
100% of models include content provenance
Monthly assessment of mitigations
Continued use of human moderators for content review

Areas requiring progress remain consistent with July’s report, including the need to implement systematic model assessment and red teaming.

October 2024: Invoke

As a new participant since July 2024, Invoke reports initial progress:

Implemented prompt monitoring using third-party tools (askvera.io)
Detected 73 instances of violative prompts, all reported to NCMEC
Invested $100,000 in R&D for protective tools
Incorporated prevention messaging directing users to redirection programs
Utilizes Thorn’s hashlist to block problematic models

Areas requiring progress include implementing CSAM detection at inputs, incorporating comprehensive output review, and expanding user reporting functionality for their OSS offering.

July 2024: Civitai

Civitai, a platform for hosting third-party generative AI models, reports that they have made progress in safeguarding against abusive content and responsible model hosting:

Uses multi-layered moderation with automated filters and human review for prompts, content and media inputs.Maintains an internal hash database to prevent re-upload of removed images and removed models that violate child safety policies.
Reports confirmed child sexual abuse material (CSAM) to NCMEC, noting generative AI flags.
Established terms of service banning exploitative material and models, and created reporting pathways for users.

However, there remain some areas for Civitai that require more progress to meet their commitments:

Expand moderation using hashing against verified CSAM lists and prevention messaging.
Assess output content and incorporate content provenance features.
Implement pre-hosting assessments for new models and retroactively assess current models for child safety violations.
Add child safety information to model cards and develop strategies to prevent the use of nudifying services.

July 2024: Metaphysic

Sources data from film studios with legal warranties and required consent from depicted individuals.
Employs human moderators and AI tools to review data and separate sexual content from depictions of children.
Adopts C2PA standard to label AI-generated content.
Limits model access to employees and has processes for customer feedback on content.
Updates datasets and model cards to include sections detailing child safety measures during development.

However, there remain some areas for Metaphysic that require more progress to meet their commitments:

Incorporate systematic model assessment and red teaming of their generative AI models for child safety violations.
Engage with C2PA to understand the ways in which C2PA is and is not robust to adversarial misuse, and – if necessary – support development and adoption of solutions that are sufficiently robust.

Annual Progress Reports

Several companies have committed to reporting on an annual cadence, with their first reports expected in April 2025 – one year after the Safety By Design commitments were launched. These companies include Amazon, Anthropic, Google, Meta, Microsoft, Mistral AI, OpenAI, and Stability AI. Their comprehensive reports will provide insights into how they have implemented and maintained the Safety By Design principles across their organizations and technologies over the first full year of commitment.

View All Blog Articles

2 min read

News

Thorn Welcomes Two New Board Members

We are thrilled to announce the addition of two exceptional leaders to Thorn’s Board of Directors: Mike Abbott and John Bennett. Their combined wealth of experience in technology, artificial intelligence, […]

Read Article

6 min read

News

Thorn’s Safety by Design for Generative AI: 3-Month Progress Report on Civitai and Metaphysic

Three months ago, some of the world’s most influential AI leaders made a groundbreaking commitment to protect children from the misuse of generative AI technologies. In collaboration with Thorn and […]

Read Article

36 min read

News

TrustCon 2024 Panel: Enacting Generative AI Safety by Design Principles to Combat Child Sexual Abuse

Reflections on enacting the principles At 2024 TrustCon, Thorn’s Vice President of Data Science, Rebecca Portnoff, hosted a panel including some of the generative AI leaders who made commitments this […]

Read Article

Thorn’s Safety by Design for Generative AI: Progress Reports

Safety By Design: Industry Commitments

Three-Month Progress Reports

October 2024: Civitai

October 2024: Metaphysic

October 2024: Invoke

July 2024: Civitai

July 2024: Metaphysic

Annual Progress Reports

Related Articles

Thorn Welcomes Two New Board Members

Thorn’s Safety by Design for Generative AI: 3-Month Progress Report on Civitai and Metaphysic

TrustCon 2024 Panel: Enacting Generative AI Safety by Design Principles to Combat Child Sexual Abuse

Get the latest delivered to your inbox