Wednesday, July 05, 2023

Inevitable result of training AI on an increasing volume of AI generated data.

https://www.schneier.com/blog/archives/2023/07/class-action-lawsuit-for-scraping-data-without-permission.html

Class-Action Lawsuit for Scraping Data without Permission

I have mixed feelings about this class-action lawsuit against OpenAI and Microsoft, claiming that it “scraped 300 billion words from the internet” without either registering as a data broker or obtaining consent. On the one hand, I want this to be a protected fair use of public data. On the other hand, I want us all to be compensated for our uniquely human ability to generate language.

There’s an interesting wrinkle on this. A recent paper showed that using AI generated text to train another AI invariably “causes irreversible defects.” From a summary:

The tails of the original content distribution disappear. Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions. We call this effect model collapse.
Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale. Indeed, we already see AI startups hammering the Internet Archive for training data.

This is the same idea that Ted Chiang wrote about: that ChatGPT is a “blurry JPEG of all the text on the Web.” But the paper includes the math that proves the claim.

What this means is that text from before last year—text that is known human-generated—will become increasingly valuable





Somehow I think Meta will find a way to peek…

https://techcrunch.com/2023/07/04/cjeu-meta-superprofiling-decision/

CJEU ruling on Meta referral could close the chapter on surveillance capitalism

Mark your calendar European friends: July 4th could soon be celebrated as independence-from-Meta’s-surveillance-capitalism-day… A long-anticipated judgement handed down today by the Court of Justice of the European Union (CJEU) looks to have comprehensively crushed the social media giant’s ability to keep flouting EU privacy law by denying users a free choice over its tracking and profiling.

The ruling tracks back to a pioneering order by Germany’s antitrust watchdog, the Federal Cartel Office (FCO), which spent years investigating Facebook’s business — making the case that privacy harm should be treated as an exploitative competition abuse too.





Perspective.

https://www.bespacific.com/artificial-intelligence-in-science/

Artificial Intelligence in Science

Artificial Intelligence in Science – Challenges, Opportunities and the Future of Research [300 page e-book available free via OECD]: “The rapid advances of artificial intelligence (AI) in recent years have led to numerous creative applications in science. Accelerating the productivity of science could be the most economically and socially valuable of all the uses of AI. Utilising AI to accelerate scientific productivity will support the ability of OECD countries to grow, innovate and meet global challenges, from climate change to new contagions. This publication is aimed at a broad readership, including policy makers, the public, and stakeholders in all areas of science. It is written in non-technical language and gathers the perspectives of prominent researchers and practitioners. The book examines various topics, including the current, emerging, and potential future uses of AI in science, where progress is needed to better serve scientific advancements, and changes in scientific productivity. Additionally, it explores measures to expedite the integration of AI into research in developing countries. A distinctive contribution is the book’s examination of policies for AI in science. Policy makers and actors across research systems can do much to deepen AI’s use in science, magnifying its positive effects, while adapting to the fast-changing implications of AI for research governance.”



No comments: