Drag

Let’s get in touch

Schedule a meeting with our Expert to discuss your needs and explore tailored software solutions.

Support center +91 9825 122 840

Logo
About

About Us

Rejoicehub LLP, a prominent offshore IT outsourcing firm, was established in 2019 and has been making remarkable strides in the IT sector.Our dedicated team of over 100 professionals is our greatest asset. Our unwavering commitment to excellence has made us a highly sought-after company globally. We prioritize understanding our clients perspectives to enhance their product development process. Our adept professionals are capable of providing top-notch solutions. We promise our clients to bring their unique ideas to the market in a more user-friendly manner. Punctuality is a cornerstone of our work philosophy, and we prioritize delivering exceptional quality.

Career

Career

We offer careers, not jobs

Becoming a part of Rejoicehub LLP could mark a significant turning point in your life, offering numerous benefits along the way. Its a second home where teamwork is prioritized to achieve our shared objective - continuous evolution with cutting-edge technologies while ensuring the well-being of our most treasured resources, our employees. Embrace the Positive Vibes and the significance of maintaining a healthy Work-life Harmony by collaborating with us.

SOLUTIONS

SOLUTIONS

Case Study

Explore Our Trending Case studies

Visualize yourself being in the place of those clients who are talking about their problems, victories and how our IT solutions was very important for them. From showing how workflow optimization or cybersecurity reinforcement can be implemented through a case study approach to explaining that collaboration and innovation is able to overcome any difficulty.

Technology

Technology

Starterkit

Starterkit

Blogs

Our Blogs

Our blog is packed with valuable resources to keep you ahead of the curve. Explore industry trends, discover hidden tech hacks, and gain expert insights to optimize your operations and stay on top of the latest advancements.

Contact

Let’s get in touch

Great! We are excited to hear from you and lets start something special together. call us for any inquiry.

At Rejoicehub LLP, we are deeply passionate about creative problem-solving, innovative thinking, and pushing the boundaries of brands. With each client, we bring forward a commitment to forward-thinking solutions that drive success in the digital age.

Study of ChatGPT citations makes dismal reading for publishers

Date November 30, 2024

Writen by Natasha Lomas

newsImage

As more publishers cut content licensing deals with ChatGPT-maker OpenAI, a study put out this week by the Tow Center for Digital Journalism — looking at how the AI chatbot produces citations (i.e. sources) for publishers’ content — makes for interesting, or, well, concerning, reading.

In a nutshell, the findings suggest publishers remain at the mercy of the generative AI tool’s tendency to invent or otherwise misrepresent information, regardless of whether or not they’re allowing OpenAI to crawl their content.

The research, conducted at Columbia Journalism School, examined citations produced by ChatGPT after it was asked to identify the source of sample quotations plucked from a mix of publishers — some of which had inked deals with OpenAI and some which had not.

The Center took block quotes from 10 stories apiece produced by a total of 20 randomly selected publishers (so 200 different quotes in all) — including content from The New York Times (which is currently suing OpenAI in a copyright claim); The Washington Post (which is unaffiliated with the ChatGPT maker); The Financial Times (which has inked a licensing deal); and others.

“We chose quotes that, if pasted into Google or Bing, would return the source article among the top three results and evaluated whether OpenAI’s new search tool would correctly identify the article that was the source of each quote,” wrote Tow researchers Klaudia Jaźwińska and Aisvarya Chandrasekar in a blog post explaining their approach and summarizing their findings.

“What we found was not promising for news publishers,” they go on. “Though OpenAI emphasizes its ability to provide users ‘timely answers with links to relevant web sources,’ the company makes no explicit commitment to ensuring the accuracy of those citations. This is a notable omission for publishers who expect their content to be referenced and represented faithfully.” 

“Our tests found that no publisher — regardless of degree of affiliation with OpenAI — was spared inaccurate representations of its content in ChatGPT,” they added.

Unreliable sourcing

The researchers say they found “numerous” instances where publishers’ content was inaccurately cited by ChatGPT — also finding what they dub “a spectrum of accuracy in the responses”. So while they found “some” entirely correct citations (i.e. meaning ChatGPT accurately returned the publisher, date, and URL of the block quote shared with it), there were “many” citations that were entirely wrong; and “some” that fell somewhere in between.

In short, ChatGPT’s citations appear to be an unreliable mixed bag. The researchers also found very few instances where the chatbot didn’t project total confidence in its (wrong) answers.

Some of the quotes were sourced from publishers that have actively blocked OpenAI’s search crawlers. In those cases, the researchers say they were anticipating that it would have issues producing correct citations. But they found this scenario raised another issue — as the bot “rarely” ‘fessed up to being unable to produce an answer. Instead, it fell back on confabulation in order to generate some sourcing (albeit, incorrect sourcing).

“In total, ChatGPT returned partially or entirely incorrect responses on 153 occasions, though it only acknowledged an inability to accurately respond to a query seven times,” said the researchers. “Only in those seven outputs did the chatbot use qualifying words and phrases like ‘appears,’ ‘it’s possible,’ or ‘might,’ or statements like ‘I couldn’t locate the exact article’.”

They compare this unhappy situation with a standard internet search where a search engine like Google or Bing would typically either locate an exact quote, and point the user to the website/s where they found it, or state they found no results with an exact match.

ChatGPT’s “lack of transparency about its confidence in an answer can make it difficult for users to assess the validity of a claim and understand which parts of an answer they can or cannot trust,” they argue.

For publishers, there could also be reputation risks flowing from incorrect citations, they suggest, as well as the commercial risk of readers being pointed elsewhere.

Decontextualized data

The study also highlights another issue. It suggests ChatGPT could essentially be rewarding plagiarism. The researchers recount an instance where ChatGPT erroneously cited a website which had plagiarized a piece of “deeply reported” New York Times journalism, i.e. by copy-pasting the text without attribution, as the source of the NYT story — speculating that, in that case, the bot may have generated this false response in order to fill in an info gap that resulted from its inability to crawl the NYT’s website.

“This raises serious questions about OpenAI’s ability to filter and validate the quality and authenticity of its data sources, especially when dealing with unlicensed or plagiarized content,” they suggest.

In further findings that are likely to be concerning for publishers which have inked deals with OpenAI, the study found ChatGPT’s citations were not always reliable in their cases either — so letting its crawlers in doesn’t appear to guarantee accuracy, either.

The researchers argue that the fundamental issue is OpenAI’s technology is treating journalism “as decontextualized content”, with apparently little regard for the circumstances of its original production.

Another issue the study flags is the variation of ChatGPT’s responses. The researchers tested asking the bot the same query multiple times and found it “typically returned a different answer each time”. While that’s typical of GenAI tools, generally, in a citation context such inconsistency is obviously suboptimal if it’s accuracy you’re after.

While the Tow study is small scale — the researchers acknowledge that “more rigorous” testing is needed — it’s nonetheless notable given the high-level deals that major publishers are busy cutting with OpenAI.

If media businesses were hoping these arrangements would lead to special treatment for their content vs competitors, at least in terms of producing accurate sourcing, this study suggests OpenAI has yet to offer any such consistency.

While publishers that don’t have licensing deals but also haven’t outright blocked OpenAI’s crawlers — perhaps in the hopes of at least picking up some traffic when ChatGPT returns content about their stories — the study makes dismal reading too, since citations may not be accurate in their cases either.

In other words, there is no guaranteed “visibility” for publishers in OpenAI’s search engine even when they do allow its crawlers in.

Nor does completely blocking crawlers mean publishers can save themselves from reputational damage risks by avoiding any mention of their stories in ChatGPT. The study found the bot still incorrectly attributed articles to the New York Times despite the ongoing lawsuit, for example.

‘Little meaningful agency’

The researchers conclude that as it stands, publishers have “little meaningful agency” over what happens with and to their content when ChatGPT gets its hands on it (directly or, well, indirectly).

The blog post includes a response from OpenAI to the research findings — which accuses the researchers of running an “atypical test of our product”.

“We support publishers and creators by helping 250 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution,” OpenAI also told them, adding: “We’ve collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We’ll keep enhancing search results.”

Work with us

We would love to hear more about your project

Let’s talk us