Web Application Gotchas: Modifying data with links (GET requests)
So, you’ve built your nice little web application, and all is well. Users are using it and filling your nice little database with data. Then suddenly, things start disappearing. Everything that the users have spent hours creating is deleted, one row at a time. What’s happening? You start debugging… Someone is actually clicking the links to delete content. Sabotage, perhaps?  You search through the logs and find the IP address of the offending user. After tracing the activity of that user, you track it down to Bob, one of the web developers.
You make your way down to his office and start accusing him of messing with the application, but Bob claims he is innocent.While you’re yelling at him, Sandy comes in and tells you that something she created recently was just deleted. Bob has an alibi for this one, what’s going on? You go back to your desk and start searching the logs again, and quickly locate the cuplrit. His user agent string is …  gsa-crawler. Wait, what?!
gsa-crawler is the user agent string used by the Google Search Appliance, which is a search engine in a box that can be used to index and search intranets. John in Sysadmin set it up a  few days ago to crawl all internal websites. It turns out it crawled your new web application, found the links to delete content, and followed them.
It also turns out that Bob was running Firefox with the Fasterfox plugin installed, which can be set up to automatically prefetch web pages. When Bob went to a page in your web application that had links that deleted content, it followed those links as it should, and unknowingly deleted content.
So, perhaps you see now why modifying data with links can be a bad idea. Sure, it might work in a lot of cases… The web application can have access control, preventing crawlers from following links it shouldn’t follow, and users might not be using browsers that do aggressive link prefetching.But why not be on the safe side, and use GET requests to get data (aha!) and POST requests to send data to the application, and never have to worry about these problems?
Update: I’ve forgotten to write about CSRF, which is easier when GET requests modify data. I’ll leave that for another installment of Web Application Gotchas.Â