AI training on user data should be opt-in by default

Getting a bit tired of all this AI? It seems that every company now wants to use AI, and ideally use our data for free on top of that. While it’s understandable that more (good) data leads to better models and results, hasn’t this trend been going in the wrong direction for a while now?

Companies are increasingly changing their terms and conditions so that, eventually, all user data will be used to train AI. There’s nothing wrong with that if users agree to it.

The problem, as I see it, is that our data is being used without our consent. Companies either think they can use everything without restrictions because of their wealth, or they change terms and conditions and now say that data will be used for AI training.

I think AI training on user data should be opt-in by default. If someone wants their data to be used for AI training, they can explicitly agree by changing settings in the software.

As I said, I understand that data is necessary. Without it, we wouldn’t have interesting and effective AI products. However, the current practice feels like a misappropriation and misuse of data. I suspect that in a large number of cases, this is happening without users’ consent, as they either have no recourse or are poorly informed. Do you think most users who post online can prevent OpenAI, Google, Microsoft, etc. from using their data? Do you think all users of Adobe, Figma, Facebook or any other tool are aware that AI is being trained on their data? Do you think they’ll all opt out in time?

We’ve heard for a long time that “if it’s free, you’re the product”. I’m not sure that definition still quite applies. Well, even if you pay, you can still be the product. Even if you pay for a product, your data can still be used (for free) to train AI that will later be offered to you at an additional cost.

Anything you put on the internet should be considered public, and now free to use.

The worst part? The vast majority of people seem to be indifferent to this.