Centennial Man

Tuesday, August 13, 2024

Isn’t “out into the world” the same as “public?” If someone saw an individual at a crime scene when the crime was committed, that would be admissible. “Time shifting” by relying on geolocation data (or videos or even fingerprints) seems to me to be the same.

https://www.eff.org/deeplinks/2024/08/federal-appeals-court-finds-geofence-warrants-are-categorically-unconstitutional

Federal Appeals Court Finds Geofence Warrants Are “Categorically” Unconstitutional

In a major decision on Friday, the federal Fifth Circuit Court of Appeals held that geofence warrants are “categorically prohibited by the Fourth Amendment.” Closely following arguments EFF has made in a number of cases, the court found that geofence warrants constitute the sort of “general, exploratory rummaging” that the drafters of the Fourth Amendment intended to outlaw. EFF applauds this decision because it is essential that every person feels like they can simply take their cell phone out into the world without the fear that they might end up a criminal suspect because their location data was swept up in open-ended digital dragnet.

This may important later…

https://www.bespacific.com/the-files-are-in-the-computer-on-copyright-memorization-and-generative-ai/

The Files are in the Computer: On Copyright, Memorization, and Generative AI

Cooper, A. Feder and Grimmelmann, James and Grimmelmann, James, The Files are in the Computer: On Copyright, Memorization, and Generative AI (April 22, 2024). Cornell Legal Studies Research Paper Forthcoming, Chicago-Kent Law Review, Forthcoming, Available at SSRN: https://ssrn.com/abstract=4803118 – “The New York Times’s copyright lawsuit against OpenAI and Microsoft alleges that OpenAI’s GPT models have “memorized” Times articles. Other lawsuits make similar claims. But parties, courts, and scholars disagree on what memorization is, whether it is taking place, and what its copyright implications are. Unfortunately, these debates are clouded by deep ambiguities over the nature of “memorization,” leading participants to talk past one another. In this Essay, we attempt to bring clarity to the conversation over memorization and its relationship to copyright law. Memorization is a highly active area of research in machine learning, and we draw on that literature to provide a firm technical foundation for legal discussions. The core of the Essay is a precise definition of memorization for a legal audience. We say that a model has “memorized” a piece of training data when (1) it is possible to reconstruct from the model (2) a near-exact copy of (3) a substantial portion of (4) that specific piece of training data. We distinguish memorization from “extraction” (in which a user intentionally causes a model to generate a near-exact copy), from “regurgitation” (in which a model generates a near-exact copy, regardless of the user’s intentions), and from “reconstruction” (in which the near-exact copy can be obtained from the model by any means, not necessarily the ordinary generation process). Several important consequences follow from these definitions. First, not all learning is memorization: much of what generative-AI models do involves generalizing from large amounts of training data, not just memorizing individual pieces of it. Second, memorization occurs when a model is trained; it is not something that happens when a model generates a regurgitated output. Regurgitation is a symptom of memorization in the model, not its cause. Third, when a model has memorized training data, the model is a “copy” of that training data in the sense used by copyright law. Fourth, a model is not like a VCR or other general-purpose copying technology; it is better at generating some types of outputs (possibly including regurgitated ones) than others. Fifth, memorization is not just a phenomenon that is caused by “adversarial” users bent on extraction; it is a capability that is latent in the model itself. Sixth, the amount of training data that a model memorizes is a consequence of choices made in the training process; different decisions about what data to train on and how to train on it can affect what the model memorizes. Seventh, system design choices also matter at generation time. Whether or not a model that has memorized training data actually regurgitates that data depends on the design of the overall system: developers can use other guardrails to prevent extraction and regurgitation. In a very real sense, memorized training data is in the model—to quote Zoolander, the files are in the computer.”

Centennial Man

Tuesday, August 13, 2024

No comments:

Search This Blog

Followers

Blog Archive

Links

About Me