Handling Erroneous Dynamic URLS
When creating dynamic pages, make sure you handle weird urls so that malicious linking doesn’t lead to the indexing of non-existent urls. Use HTTP header checker to see server responses. Catch all invalid requests with a custom 404 page. What URLS to check? Try to let .htaccess handle as much of the 404s as possible instead. PHP should be reserved for checking query values. You don’t want a chain of 301s where invalid URLs are returning 301s. If the bad urls already exist, I would replace the 404s with 301s below.
- /index.html => 404
- /index.htm => 404
- /?junksdso => 404
- // => 404
- /index =>404
- /%20 =>404 (One time a guy linked me with this and the url got indexed)
- /scriptfile.html => 404 Direct access to the file that generates dynamic content should generate a 404. Better yet, rename the file to something difficult to crack. Don’t forget to delete unencrypted versions of files on your server.
- 404 any invalid query values. For example, if a page produce unique pages for values 1-5, a value of 6 should 404.
- If you’re not using unencrypted files:
- /validurl.html?junk=query =>404
- /validurl.html?junk= should also 404.
- /path 301=> /path/ Missing / for urls that end in / must 301 to url/. Yahoo will drop the /, so I can’t have that URL 404.
- If a page is not dynamic, make sure page.html?bogusquery=string => 404
- Capitalization should not matter. Use [NC] in Cond statements to avoid this.
- urls with +, _ or - should be handled correctly.
Question: If I link to a url like this: index.html? (with nothing following the question mark) will Google index it?
What's Your Take?