Thorn’s Safety by Design for Generative AI: Progress Reports
December 11, 2024
4 Minute Read
Safety By Design: Industry Commitments
As part of Thorn and All Tech Is Human’s Safety By Design initiative, some of the world’s leading AI companies have made a significant commitment to protect children from the misuse of generative AI technologies.
The organizations—including Amazon, Anthropic, Civitai, Google, Invoke, Meta, Metaphysic, Microsoft, Mistral AI, OpenAI and Stability AI—have all pledged to adopt the campaign principles, which aim to prevent the creation and spread of AI-generated child sexual abuse material (AIG-CSAM) and other sexual harms against children.
As part of their commitments, these companies will continue to transparently publish and share documentation of their progress in implementing these principles.
This is a critical component of our overall three-pillar strategy for accountability:
- Publishing progress reports with insights from the committed companies (to support public awareness and pressure where necessary)
- Collaborating with standard setting institutions to scale the reach of these principles and mitigations (opening the door for third party auditing)
- Engaging with policymakers such that they understand what is technically feasible and impactful in this space, to inform necessary legislation.
Three-Month Progress Reports
Some participating companies have committed to reporting their progress on a three-month cadence (Civitai, Invoke, and Metaphysic), while others will report annually. Below are the latest updates from the companies reporting quarterly. You can also download the latest three-month progress report in full here.
October 2024: Civitai
Civitai reports no additional progress since their July 2024 report, citing other work priorities. Their metrics show continued moderation efforts:
- Detected over 120,000 violative prompts, with 100,000 indicating attempts to create AIG-CSAM
- Prevented over 400 attempts to upload models optimized for AIG-CSAM
- Removed approximately 5-10 problematic models per month
- Detected and reported 2 instances of CSAM and over 100 instances of AIG-CSAM to NCMEC Areas requiring continued progress remain the same as July’s report.
Areas requiring progress remain consistent with July’s report, including the need to retroactively assess third-party models currently hosted on their platform.
October 2024: Metaphysic
Metaphysic reports no additional progress since their July 2024 report, citing other work priorities related to being in the middle of a funding process. Their metrics show continued maintenance of their existing safeguards:
- 100% of datasets audited and updated
- No CSAM detected in their datasets
- 100% of models include content provenance
- Monthly assessment of mitigations
- Continued use of human moderators for content review
Areas requiring progress remain consistent with July’s report, including the need to implement systematic model assessment and red teaming.
October 2024: Invoke
As a new participant since July 2024, Invoke reports initial progress:
- Implemented prompt monitoring using third-party tools (askvera.io)
- Detected 73 instances of violative prompts, all reported to NCMEC
- Invested $100,000 in R&D for protective tools
- Incorporated prevention messaging directing users to redirection programs
- Utilizes Thorn’s hashlist to block problematic models
Areas requiring progress include implementing CSAM detection at inputs, incorporating comprehensive output review, and expanding user reporting functionality for their OSS offering.
July 2024: Civitai
Civitai, a platform for hosting third-party generative AI models, reports that they have made progress in safeguarding against abusive content and responsible model hosting:
- Uses multi-layered moderation with automated filters and human review for prompts, content and media inputs.Maintains an internal hash database to prevent re-upload of removed images and removed models that violate child safety policies.
- Reports confirmed child sexual abuse material (CSAM) to NCMEC, noting generative AI flags.
- Established terms of service banning exploitative material and models, and created reporting pathways for users.
However, there remain some areas for Civitai that require more progress to meet their commitments:
- Expand moderation using hashing against verified CSAM lists and prevention messaging.
- Assess output content and incorporate content provenance features.
- Implement pre-hosting assessments for new models and retroactively assess current models for child safety violations.
- Add child safety information to model cards and develop strategies to prevent the use of nudifying services.
July 2024: Metaphysic
- Sources data from film studios with legal warranties and required consent from depicted individuals.
- Employs human moderators and AI tools to review data and separate sexual content from depictions of children.
- Adopts C2PA standard to label AI-generated content.
- Limits model access to employees and has processes for customer feedback on content.
- Updates datasets and model cards to include sections detailing child safety measures during development.
However, there remain some areas for Metaphysic that require more progress to meet their commitments:
- Incorporate systematic model assessment and red teaming of their generative AI models for child safety violations.
- Engage with C2PA to understand the ways in which C2PA is and is not robust to adversarial misuse, and – if necessary – support development and adoption of solutions that are sufficiently robust.
Annual Progress Reports
Several companies have committed to reporting on an annual cadence, with their first reports expected in April 2025 – one year after the Safety By Design commitments were launched. These companies include Amazon, Anthropic, Google, Meta, Microsoft, Mistral AI, OpenAI, and Stability AI. Their comprehensive reports will provide insights into how they have implemented and maintained the Safety By Design principles across their organizations and technologies over the first full year of commitment.