“OpenAI and Anthropic also ignore robots.txt requests not to scrape sites” - IT Pro

If he doesn’t want the content copied and a robots.txt file created for it, that’s his prerogative.

If you don’t want your content deleted, put that content behind your login information using clear terms of service. The moment you post content publicly online, it becomes public. If you’re walking around naked in the park, with a view from the street, you shouldn’t complain if people take pictures of your naked behind. Yes, it’s your house, yes, it’s private property but if you create something that is visible to the public, there will be control over you.

Your reasoning of “decency and respect towards content owners”… well, that’s a two-way street, isn’t it? I’m assuming you’re not running an ad blocker because you don’t respect the content owner!

The content owner can also disrespect people who place flashing ads or 1001s on the page. People sometimes forget that the reason ad blockers are so popular is because of the hassle that the content owner had years ago (and still is) not respecting the people on the street.

All the logic of respect and the Internet, guys, how naive… Assume that the owner of the content also respects you by not reselling your data? Watch.. a million-dollar deal with Google to sell artificial intelligence to Reddit and other sites for their users’ data. Facebook, …

Let’s face it, the only reason content owners are now complaining about copy tools ignoring robots.txt files is because they don’t want “their” content deleted without $$$$, and they want to resell it themselves (and that “their” content, often… be actual content from third parties).

So don’t lose respect… And yes, I run many sites and I can assure you that once the server is activated, within 5 seconds, the log is sufficient with no /xxx found, no /yyy found. This is what is happening now, and it happened 20 years ago. The only difference is that people are now complaining about AI scrapers making money off of “your content”. Here’s something you don’t know, that was happening before we had AI. And the biggest culprits were often the biggest companies!

I suggest we ban Google because I can assure you that Google has never respected robots.txt file. Google “Google robots do not respect text”

Haley Howe

“Coffee buff. Twitter fanatic. Tv practitioner. Social media advocate. Pop culture ninja.”

“OpenAI and Anthropic also ignore robots.txt requests not to scrape sites” – IT Pro – News

Strong increase in gas export pipeline from Norway to Europe

George Louis Bouchez still puts Julie Tatton on the list.

Thai Air Force wants Swedish Gripen 39 fighter jets

US stars shine in lion’s den: Stephen Curry leads Team USA to gold

Russian Tortoises: The Ideal Pet for Reptile Enthusiasts

Which can cause an increase in nitrogen.

Transfer news and rumours 29/08: Dendoncker – Keita – Van den Bosch – Hong

Leave a Reply Cancel reply

More Stories