A while ago I read about an idea to make it easier to avoid common programming mistakes in PHP regarding the handling of strings. There are dozens of attacks that one must pay attention to when using strings: you have to escape your string one way when you embed it in an SQL statement, escape it in a different way when outputting it as part of a web-page (XSL attacks), and escape it in a third way when you output it as part of a HTTP-header. It’s not surprising that eventually somewhere something will be not escaped in the right way.
Mike Wells suggests a SafeString class to encapsulate all Strings in a class with different access methods that automatically escape your string the right way. So if you were to output the string back to the user, you’d call a toHTML() method that properly escapes any HTML-tags and special characters embedded in the string. A method to access the raw string would be called “UnsafeRawString” to remind the programmer that the string contains “tainted” user-input. While it is still possible to do something wrong, these parts stick out in the code (for example, one might use String->toHTML() when using it in an SQL statement - obviously wrong, but much easier to find). See “Making Wrong Code look Wrong” for the underlying philosophy.
I really like the idea, but I saw a couple of practical problems with this idea. And so I took it upon myself to build something that fixes some of the following problems:- All strings, including Server variables and Super-Globals, should be automatically converted to the new String class. Otherwise the programmer has to constantly figure out if he/she is dealing with an encapsulated string or not. I implemented SafeString as a include-file that you include on top of each of your PHP-scripts. It will automatically convert all the Super-Globals like _GET, _POST, _SERVER etc. into SafeStrings.
- You’d need a database abstraction layer that will return these kind of strings as results of queries. This can be done easily as well using XYZ-Function call in your Abstraction-Layer code to convert an entire associative array into SafeStrings.
- All the existing PHP string operations (from strcmp to soundex) must be usable. This can be tricky, but interestingly PHP5 offers a way with __call to overload the object with arbitrarily named functions (see overload() function in PHP4). With some eval-magic this could be doable. Technically you wouldn’t want anybody to ever to work with the UnsafeRawString…. My solution for this is explained in the following.
| strcmp($foo->UnsafeRawString(), $bar->UnsafeRawString) |
| $foo->strstr($bar) |
| $foo->eval("strcmp(THIS,$bar)") |
| $foo->strcmp($THIS_SAFESTRING, $bar); |
So there entire class now looks like this:
| |
| Download |
I haven't tested it on PHP5 yet, but this works on PHP4. It's a bit painfull to try and make it work on both PHP versions. It should work... :-)
So, is this for real and will it be used? Not sure about that. Probably it would require "fixing" various database abstraction layers and template-engines to get this all into some usable state. Until then this code is more a request for comment :-)