JetBrains wants to train AI models on your code snippets
Briefly

JetBrains wants to train AI models on your code snippets
"The changed data sharing options are set to land in the 2025.2.4 versions of the JetBrains series of IDEs, expected in around two weeks' time, and including IntelliJ IDEA, PyCharm, Rider, RubyMine, and PhpStorm. The new setting for sharing detailed code-related data specifically states that the data will be used for model training purposes. In some cases, such as for non-commercial users, this data sharing will be enabled by default."
""That sounds like a lot, and it is, but that's where the real value for improvements comes from," said the official post on a new approach to data collection in JetBrains IDEs. JetBrains argues that most AI coding models are trained on public code that do not reflect the "complex, real-world scenarios" of professional development, and insists that it needs data on real usage in order to provide what is needed."
"The company is offering a substantial incentive to organizations that are happy to hand over their data - free All Products Pack subscriptions for one year for employees, currently priced at $979.00 per user/year. There is a waitlist and the offer is described as limited. The changed data sharing options are set to land in the 2025.2.4 versions of the JetBrains series of IDEs, expected in around two weeks' time, and including IntelliJ IDEA, PyCharm, Rider, RubyMine, and PhpStorm."
JetBrains intends to collect detailed code-related usage data — including code snippets, prompt text, AI responses, edit history, and terminal usage — to improve AI coding models. The company offers free All Products Pack subscriptions for one year per employee as an incentive, with a limited waitlist. Data-sharing controls will arrive in the 2025.2.4 IDE releases (IntelliJ IDEA, PyCharm, Rider, RubyMine, PhpStorm) and will explicitly state that data may be used for model training. For some non-commercial users the setting is enabled by default; it will be opt-in for commercial licenses and off by default for centrally managed organizations. A prior internal trial was promising but requires broader scaling. Use of code for model training raises intellectual property and code-regurgitation concerns.
Read at Theregister
Unable to calculate read time
[
|
]