SE has decided to charge for the network content when used for training AI according to an article. This has not been officially announced here, so we do not know the details.

But the main question to me is what the legal basis of this decision is. Here are the relevant parts of the terms of service as far as I can see:

You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0), and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content, even if such Subscriber Content has been contributed and subsequently removed by you as reasonably necessary to, for example (without limitation):

  • Provide, maintain, and update the public Network
  • Process lawful requests from law enforcement agencies and government agencies
  • Prevent and address security incidents and data security features, support features, and to provide technical assistance as it may be required
  • Aggregate data to provide product optimization

This means that you cannot revoke permission for Stack Overflow to publish, distribute, store and use such content and to allow others to have derivative rights to publish, distribute, store and use such content. The CC BY-SA 4.0 license terms are explained in further detail by Creative Commons, and the license terms applicable to content are explained in further detail here. You should be aware that all Public Content you contribute is available for public copy and redistribution, and all such Public Content must have appropriate attribution.

This grants SE permission to use the data, but "pursuant to Creative Commons licensing terms (CC BY-SA 4.0)".

The second part looks like a dual-license under a less restrictive license. The limitation of that license by a list that explicitly says it isn't a limitation is a bit odd, but maybe that is some standard legalese. It does not explicitly say "resell", but "commercially exploit" could cover that.

So does SE have the legal right to sell subscriber content under a license different than the CC license it was posted under? Can they essentially remove the attribution requirement whenever they want by relicensing the code?

  • StackOverflow cannot/does not want to give legal advice, so there won't be any official answer here, but it really looks like dual licensing to me and the "as reasonably necessary to" just points to examples and means (almost) nothing in this context. Commented Apr 21, 2023 at 7:04
  • 2
  • 2
    Why do you claim that they can remove the requirement for attribution? Your quoted text ends with "all such Public Content must have appropriate attribution."
    – PM 2Ring
    Commented Apr 21, 2023 at 10:45
  • 1
    @PM2Ring if they can relicense the code, they can relicense it under terms that don't require attribution. So the main question is whether the ToS allow them to relicense content under a non-CC license. Commented Apr 21, 2023 at 10:58
  • 3
    The second part might look like a dual-license to you, but I don't see it that way. But hopefully a staff member will clarify this.
    – PM 2Ring
    Commented Apr 21, 2023 at 11:04
  • 2
    Question is currently closed (despite this not being addressed in the dupe), so here's what I would've written: I'm not a lawyer, but I don't think this is a dual license. That second "and you grant" phrase is qualified at the end by the "even if such Subscriber Content has been ... removed by you"; in my completely non-legally qualified eye, I read the section in total as this: 1. All subscriber content is licensed under CC-by-SA 4.0, 2. SE can use the content in any way they see fit (pursuant to the license as established in #1), 3. The license isn't voided if content is removed
    – zcoop98
    Commented Apr 21, 2023 at 19:02
  • 2
    Wow. Two staff members have interacted with this question since I made my comments, and they didn't dispute your dual license theory. Now I'm worried...
    – PM 2Ring
    Commented Apr 22, 2023 at 16:29
  • 3
    @PM2Ring staff members can say whatever anyway, e.g. , Jeff Atwood wrote "The short answer is that everything you contribute is to our sites is permanently licensed under creative commons, which means we can't put it behind a paywall." and look at what SE is doing now... Commented Apr 22, 2023 at 22:23
  • @FranckDernoncourt What Jeff Atwood said is fully compatible with the dual license. Sure, it is also CC-BY-SA, but also permissive to the company. Maybe Jeff simply forgot to mention that. Commented Apr 22, 2023 at 22:41
  • 1
    @Trilarion "we can't put it behind a paywall" is incorrect. Commented Apr 22, 2023 at 22:43
  • 1
    @FranckDernoncourt The CC-BY-SA part means they can not easily put it behind a paywall because then others could come and provide a similar service. Wasn't that the idea all along? Commented Apr 22, 2023 at 22:44
  • @Trilarion Not sure what the intent is, but if an LLM tells me "we can't put it behind a paywall" then I'd say the LLM is hallucinating. Commented Apr 22, 2023 at 22:48
  • 2
    @PM2 Your worry may be warranted, but it's important to remember that any employee trying to publicly interpret the legal stance of their company is dicey, to say the least. The only SE folks that can definitively answer this are almost certainly SE's legal team.
    – zcoop98
    Commented Apr 24, 2023 at 18:15
  • 1
    @zcoop98 I exactly tried to not give that impression. They speak for the company but the company must know what the TOS means, so anyone within the company should be able to answer a question about the TOS. If they aren't sure they should ask around within the company until they are and then answer. It's just that they don't want to. I mean, what are the alternatives? Customers not sure about the TOS should not use that service. Should we all abandon SO because we don't understand the TOS? I would even say it's the obligation of SO to inform and educate users about the meaning of their TOS. Commented May 5, 2023 at 4:42
  • 1
    on the bright side of things, it seems that the potential concern that motivated this question has been (at least in word) put to rest(?): meta.stackexchange.com/a/399897/997587
    – starball
    Commented May 14, 2024 at 18:27

2 Answers 2


I'm not a lawyer, but still it's only words. So here is my layman approach at dissecting this section of the current TOS (the TOS was changed several times since the beginning of Stack Overflow, so a separate analysis of past versions of the TOS may be necessary):

  1. All content ("all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback") that users provide is called "Subscriber Content".
  2. This content is licensed to the company under CC BY-SA 4.0 ("is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0)"). I don't think it really matters to whom it is licensed (since CC BY-SA 4.0 content can be used by anyone under the same terms) so maybe one could simply license to "the public", but anyway this is the license you have in mind.
  3. Additionally ("and") this content is also licensed very permissively ("you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content") to the company under a second license (roughly as permissive as the MIT license for code is, basically a do whatever you want with it). In particular, no attribution is necessary. This is likely the license that the company preferably intends to use for the AI product and that you may have missed. Using this second license explains why Stack Overflow wouldn't need to give attribution. Other companies than StackOverflow however need to give that. To them only the CC BY-SA would apply unless they buy access to the data under this second license for example.
  4. The rest is just safety provisions: also if the content has been removed, and there is no way out, and if you use the content elsewhere you must give attribution. Basically just illustrations of what the above means.

The important part is the word "and" which specifies that there are two licenses. The second license of which does not require attribution but is also specific to the company only.

To directly answer the question: I think they can do that because they reserved that right and users agreed to it when using the service.

As a side remark: If the second license wouldn't exist and the content would only exist under the CC BY-SA license, and someone wanted to train an AI on that data, how and if at all the AI (which may not exactly reproduce the work) need to give attribution? I think this would be a very interesting question for law.SE. After all, what does reproduction and derivative work mean in the age of AI?

  • 2
    I think you're likely right, but the limitation clause with the list that explicitly says it doesn't limit anything confused me. That might be legal, but it is a really annoying thing to read and interpret. But I do wonder a bit that reselling isn't explicitly mentioned, my impression was that this is often an explicit part of license agreements Commented Apr 21, 2023 at 7:51
  • 1
    @MadScientist I observed that when in legal texts they give examples they always say "without limitations this can be a, b, c" and typically these examples are only a courtesy for your understanding and do not change the meaning of the rest. Commented Apr 21, 2023 at 7:53
  • The "as reasonably necessary" part is the one I find confusing, if it were only a list of examples that part should not be there. That clause is a limitation, but a meaningless one if the "(without limitation)" part is valid. Commented Apr 21, 2023 at 7:55
  • @MadScientist This is a detail I don't want to comment on because I'm not a lawyer, but my opinion is that likely they only want to say that if they had for some reason to remove content, it doesn't mean that they don't want to have it licenses anymore. It's just a re-iteration of "you agree that all content ... that you provide ... is licensed under...". That already includes any case of later removed content. Commented Apr 21, 2023 at 7:59
  • 12
    Note that this "second license" language was added around 2018, the old TOS is on archive.org and does not mention this "second license", so this may well depend on when the content was posted.
    – Erik A
    Commented Apr 21, 2023 at 9:27
  • 3
    @ErikA: Related question from 2017 about the old ToS: Do Stack Exchange’s ToS mean that the user-generated content is double-licensed to them?
    – unor
    Commented Apr 22, 2023 at 8:35
  • @ErikA According to meta.stackexchange.com/questions/388760/…, the second license has been included in the ToS already back in 2010. It is in your 2018 version as well: “You grant Stack Exchange the perpetual and irrevocable right and license to use, copy, cache, publish, display, distribute, modify, create derivative works and store such Subscriber Content and, except as otherwise set forth herein, to allow others to do so in any medium now known or hereinafter developed (“Content License”) in order to provide the Services, ...” Commented May 1, 2023 at 9:26
  • 1
    There was always dual licensing to provide the service itself (in order to provide the Services in the old license). The post-2018 license was far broader and included commercial exploitation for any purpose. It's a far stretch that commercially licensing our content to AI companies is needed in order to provide the services mentioned in the pre-2018 ToS.
    – Erik A
    Commented May 1, 2023 at 9:33
  • @EmilJeřábek "... in order to provide the Services..." seems to be a relevant limitation that's not present anymore now. To me this reads like the company may be allowed to develop their own AI products and offer a service based on that for itself for all content but directly sell to others only content created from 2018 on. Commented May 1, 2023 at 10:13
  • @EmilJeřábek That's not the same agreement, though. That says "in order to provide the Services" and does not say "to commercially exploit such Subscriber Content".
    – endolith
    Commented May 11, 2024 at 18:25

Can SE just resell our data, relicense it and remove the attribution requirement?

About reselling:

A while ago, Jeff Atwood stated:

The short answer is that everything you contribute is to our sites is permanently licensed under creative commons, which means we can't put it behind a paywall.

Also: we're not evil.

I pointed out that this is incorrect, since CC BY-SA doesn't prevent selling data (one can even sell public domain data, as JSTOR used to do), and this earned me some insult from a Stack Exchange employee (Shog9), which took a long time (many months) for the moderation to finally remove.

About relicensing it: As Trilarion mentioned, SE ToS indicate there is data is dual licensed, and second license SE gives many right to SE.

With that being said, since the SE data was made available under CC BY-SA, from my understanding it is legal for a firm to train an AI model on a CC BY-SA 4.0 corpus and make a commercial use of it without distributing the model under CC BY-SA. To be even safer about the BY requirement, the firm could point to a list of all SE contributors. Therefore I think SE's attempt to milk AI companies is made in vain.

  • 1
    is the mention of now-removed-insult necessary to understand your point?
    – starball
    Commented Apr 22, 2023 at 22:31
  • 5
    @starball yes, to demonstrate the reluctance of SE to divulge their stance on data ownership/policy. I believe SE Inc. has intentionally kept SE users in the dark about licensing and their intent to use the data. Commented Apr 22, 2023 at 22:31
  • 1
    oh. it might be worth spelling that out in the post itself. I didn't really come to that understanding since I can't see what was actually said by Shog9
    – starball
    Commented Apr 22, 2023 at 22:32
  • 2
    Franck helpfully linked to an archive of the insult in his previous post, @starball if you're interested in seeing it in all its glory. FWIW, I think time has backed me up on my assertion: folks like Mad here, willing to finely parse legalese and cast a skeptical eye on company actions, have done far more to keep information available than any residual good intentions on the company's part.
    – Shog9
    Commented Apr 27, 2023 at 18:18
  • 3
    @Shog9 Casting a skeptical eye on company actions is good, so I don't know why you felt the need to insult people doing so. i.sstatic.net/EmC3h.png That only contributed toward dismissing our concerns. And as you have noticed, time has backed me up on our concerns. Commented Apr 27, 2023 at 20:45

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .