web crawler – pressnews

[ad_1] Earlier this month, in response to mounting criticism around how OpenAI scoops up data to train ChatGPT, its groundbreaking chatbot, the company made it possible for websites to block it from scraping their content. A short piece of code would tell OpenAI to go away (and it would kindly obey).Since then, hundreds of sites have shut the door. A Google search reveals many of them: Major online properties such as Amazon, Airbnb, Glassdoor and Quora have added the code to their “robots.txt” file, a kind of rules of engagement for the many bots — or spiders as they are also known — that scour the internet. When I got in touch with the companies, none were willing to discuss their reasoning, but it's quite obvious: They want to put a stop to OpenAI taking content that doesn't belong to...

[ad_1] Last week, reports emerged that the New York Times may take legal action against the ChatGPT maker OpenAI as its AI models allegedly use content published on the website, which is NYT's intellectual property, to train its AI models. While that may not have happened so far, the major news publisher has now decided to ban OpenAI's web crawler from viewing content on its website. The move means that the website's content cannot be used to train any of OpenAI's AI foundational models.As per a report by The Verge, NYT has blocked OpenAI's web crawler GPTbot from searching and indexing the contents of the website. The report highlights the robot.txt page of the publication which clearly shows that the bot has been disallowed. Using the Internet Archive's Wayback Machine that lets users...

Tag: web crawler

OpenAI’s offer to stop web crawler comes too late

New York Times bans OpenAI’s GPTbot from using its content to train AI