Friday, May 24, 2024

It can’t be just the legal field.

https://hai.stanford.edu/news/ai-trial-legal-models-hallucinate-1-out-6-queries

AI on Trial: Legal Models Hallucinate in 1 out of 6 Queries

Artificial intelligence (AI) tools are rapidly transforming the practice of law. Nearly three quarters of lawyers plan on using generative AI for their work, from sifting through mountains of case law to drafting contracts to reviewing documents to writing legal memoranda. But are these tools reliable enough for real-world use?

Large language models have a documented tendency to “hallucinate,” or make up false information. In one highly-publicized case, a New York lawyer faced sanctions for citing ChatGPT-invented fictional cases in a legal brief; many similar cases have since been reported. And our previous study of general-purpose chatbots found that they hallucinated between 58% and 82% of the time on legal queries, highlighting the risks of incorporating AI into legal practice. In his 2023 annual report on the judiciary, Chief Justice Roberts took note and warned lawyers of hallucinations.

Across all areas of industry, retrieval-augmented generation (RAG) is seen and promoted as the solution for reducing hallucinations in domain-specific contexts. Relying on RAG, leading legal research services have released AI-powered legal research products that they claim avoid”  hallucinations and guarantee hallucination-free” legal citations. RAG systems promise to deliver more accurate and trustworthy legal information by integrating a language model with a database of legal documents. Yet providers have not provided hard evidence for such claims or even precisely defined “hallucination,” making it difficult to assess their real-world reliability.

In a new preprint study by Stanford RegLab and HAI researchers, we put the claims of two providers, LexisNexis and Thomson Reuters (the parent company of Westlaw), to the test. We show that their tools do reduce errors compared to general-purpose AI models like GPT-4. That is a substantial improvement and we document instances where these tools can spot mistaken premises. But even these bespoke legal AI tools still hallucinate an alarming amount of the time: these systems produced incorrect information more than 17% of the time—one in every six queries.





No surprise. It’s a mess.

https://pogowasright.org/resource-biometric-privacy-as-a-case-study-for-us-privacy-overall/

Resource: Biometric Privacy as a Case Study for US Privacy Overall

WilmerHale lawyers Kirk Nahra, Ali Jessani, Amy Olivero and Samuel Kane authored an article in the April 2024 issue of the CPI TechREG Chronicle that outlines how the rules governing biometric data reflect US privacy at large.
Excerpt: “Privacy law in the United States is best described as a patchwork of rules and regulations at both the state and federal level. This development is perhaps no better exemplified than by how the US regulates biometric information. From competing definitions to (sometimes) contradictory compliance obligations, the rules surrounding the processing of biometric information are myriad and complex, creating meaningful challenges for companies that wish to take advantage of the benefits associated with processing it (which include increased security and more convenience for consumers). This article outlines how the rules governing biometric data reflect US privacy at large and how this approach negatively impacts both consumers and businesses.”

Read the full article.





I believe that an explanation is always possible. Getting there is complex, but possible.

https://www.bespacific.com/heres-whats-really-going-on-inside-an-llms-neural-network/

Here’s what’s really going on inside an LLM’s neural network

Ars Technica: “With most computer programs—even complex ones—you can meticulously trace through the code and memory usage to figure out why that program generates any specific behavior or output. That’s generally not true in the field of generative AI, where the non-interpretable neural networks underlying these models make it hard for even experts to figure out precisely why they often confabulate information, for instance. Now, new research from Anthropic offers a new window into what’s going on inside the Claude LLM’s “black box.” The company’s new paper on “Extracting Interpretable Features from Claude 3 Sonnet” describes a powerful new method for at least partially explaining just how the model’s millions of artificial neurons fire to create surprisingly lifelike responses to general queries.



No comments: