|
A while ago I read about an idea to make it easier to avoid common
programming mistakes in PHP regarding the handling of strings. There
are dozens of attacks that one must pay attention to when using
strings: you have to escape your string one way when you embed it in
an SQL statement, escape it in a different way when outputting it
as part of a web-page (XSL attacks), and escape it in a third
way when you output it as part of a HTTP-header. It’s not
surprising that eventually somewhere something will be not escaped in
the right way.
Mike Wells suggests a SafeString class
to encapsulate all Strings in a class with different access methods that
automatically escape your string the right way. So if you were to output
the string back to the user, you’d call a toHTML() method that properly
escapes any HTML-tags and special characters embedded in the string. A method
to access the raw string would be called “UnsafeRawString” to
remind the programmer that the string contains “tainted” user-input.
While it is still possible to do something wrong, these parts stick out in the
code (for example, one might use String->toHTML() when using it in an SQL
statement - obviously wrong, but much easier to find). See
“Making
Wrong Code look Wrong” for the underlying philosophy.
I really like the idea, but I saw a couple of practical problems with this idea.
And so I took it upon myself to build something that fixes some of the following
problems:
- All strings, including Server variables and Super-Globals, should be automatically converted to the new String class. Otherwise the programmer has to constantly figure out if he/she is dealing with an encapsulated string or not.
I implemented SafeString as a include-file that you include on top of each
of your PHP-scripts. It will automatically convert all the Super-Globals
like _GET, _POST, _SERVER etc. into SafeStrings.
- You’d need a database abstraction layer that will return these
kind of strings as results of queries. This can be done easily as well
using XYZ-Function call in your Abstraction-Layer code to convert an entire
associative array into SafeStrings.
- All the existing PHP string operations (from strcmp to soundex) must
be usable. This can be tricky, but interestingly PHP5 offers a way with __call
to overload the object with arbitrarily named functions (see overload() function
in PHP4). With some eval-magic this could be doable.
Technically you wouldn’t want anybody to ever to work with the
UnsafeRawString…. My solution for this is explained in the following.
The main problem is to make all the PHP function-calls that support strings
available to the user of the class. One thing I want to avoid is having them
do something like that:
strcmp($foo->UnsafeRawString(), $bar->UnsafeRawString) |
This would completely annihilate the advantage of making the code searchable
for the parts were the unsafe string was handled that require some
careful code inspection.
Another problem that has to be addressed is that all the PHP-functions
are not working in an object-oriented manner, i.e. we can't write
something like
as we would not know which parameter has to be the objects own string.
We would have to make some arbitrary decision and I'm sure it would bite
somebody somewhere.
Another option I considered was to have users pass a string with some magic
where the objects string should be placed, i.e.
$foo->eval("strcmp(THIS,$bar)") |
I didn't like that idea, because I would have to do parsing of strings and all
that. Finally I decided for the following solution that allows calls like
the following:
$foo->strcmp($THIS_SAFESTRING, $bar); |
where $THIS_SAFESTRING is a global variable that is a placeholder for the $foo-object's
encapsulated string. I implemented this solution with __call in the SafeString-class
such that this placeholder will be replaced before the function is evaluated.
The function-wrapper takes normal strings as well as other SafeString-objects
and handles them accordingly.
The only downside to this is that inside functions you will have to
put a "global $THIS_SAFESTRING;" as I haven't figured out how to make
it a super-global...
So there entire class now looks like this:
|
<?PHP
$SAFESTRING_javascriptTR = array('\\'=>'\\\\',"'"=>"\\'",'"'=>'\\"',"\r"=>'\\r', "\n"=>'\\n', '<'=>'\\074','>'=>'\\076','&'=>'\\046','--'=>'\\055\\055'); $SAFESTRING_javascriptTRbinary = array("\x0"=>'\\x0', "\x1"=>'\\x1', "\x2"=>'\\x2', "\x3"=>'\\x3', "\x4"=>'\\x4', "\x5"=>'\\x5', "\x6"=>'\\x6', "\x7"=>'\\x7', "\x8"=>'\\x8', "\x9"=>'\\x9', "\xb"=>'\\xb', "\xc"=>'\\xc', "\xe"=>'\\xe', "\xf"=>'\\xf', "\x10"=>'\\x10', "\x11"=>'\\x11', "\x12"=>'\\x12', "\x13"=>'\\x13', "\x14"=>'\\x14', "\x15"=>'\\x15', "\x16"=>'\\x16' ,"\x17"=>'\\x17', "\x18"=>'\\x18', "\x19"=>'\\x19', "\x1a"=>'\\x1a', "\x1b"=>'\\x1b', "\x1c"=>'\\x1c', "\x1d"=>'\\x1d', "\x1e"=>'\\x1e', "\x1f"=>'\\x1f', "\x7f"=>'\\x7f', "\xff"=>'\\xff', '\\'=>'\\\\',"'"=>"\\'",'"'=>'\\"',"\r"=>'\\r',"\n"=>'\\n', '<'=>'\\074','>'=>'\\076','&'=>'\\046','--'=>'\\055\\055');
$THIS_SAFESTRING=strval("XXX_SOME_MAGIC_VALUE"); # the magic for __call
function is_SafeString( &$object ) { if (is_object($object)) { $object_name = get_class($object); return ( $object_name == $check ); } return false; }
class SafeString { var $UnsafeRawString=false; # PRIVATE :-) // constructor, takes the raw string function __construct( $RawString ) { $this->UnsafeRawString = (string)$RawString; }
# PHP4 constructor function SafeString( $RawString ) { $this->UnsafeRawString = (string)$RawString; } // returns the string safe for html output function toHTML() { return htmlentities( $this->UnsafeRawString ); } // returns the string safe for SQL function toSQL() { return mysql_real_escape_string( $this->UnsafeRawString ); }
// return string escaped for use in Javascript (assumes no escapes in string) // This can come handy ;-) function toJavascriptString($escapeBinary=false) { global $SAFESTRING_javascriptTR, $SAFESTRING_javascriptTRbinary; if ($escapeBinary) { $res = strtr( $this->UnsafeRawString , $SAFESTRING_javascriptTRbinary); } else { $res = strtr( $this->UnsafeRawString , $SAFESTRING_javascriptTR); } return $res; }
// returns the string suitable for usage in HTTP-headers // result must not contain \0,\x0d,\x0a -- prevent HTTP-response splitting attacks function toHeader() { return str_replace("\0","",str_replace("\x0d", "", str_replace("\x0a","",$this->UnsafeRawString))); }
// returns the string suitable for a file-name function toFilename() { $result = $this->toHeader(); # no \0 etc. in filenames! $result = preg_replace("/[^\.\-\s_a-zA-Z\d]/","",$result); # remove everything bad: / \ | > < etc. return $result; }
// returns the string suitable for usage in Cookies // FIXME: delete stuff unsuitable for cookies such as ; function toCookie() { return $this->toHeader(); }
// returns the string safe for Regular Expressions // such that all meta-characters are escaped (i.e. the user can NOT specify // a pattern; "(abc)*" will become "\(abc\)\*" ) function toRegEx($delim="/") { return preg_quote( str_replace("\0","",$this->UnsafeRawString) , $delim ); }
// returns the string safe for Regular Expressions // same as above, but the user can have patterns; escapes delimiters. function toRegExIsRegEx($delim="/") { return str_replace($delim, "\\".$delim, str_replace("\0","",$this->UnsafeRawString)); }
// returns the string safe for Shell Arguments function toShellArg() { return escapeshellarg( $this->toHeader() ); }
// returns the string safe for Shell Commands function toShellCmd() { return escapeshellcmd( $this->toHeader() ); }
function toInt() { return (int) intval( $this->UnsafeRawString ); }
// returns the raw (unescaped) string function toUnsafeRawString() { return $this->UnsafeRawString; }
// make this call an arbitrary function for the string say $foo->strcmp would call __call // deal with params in array; // mixed __call ( string $name, array $arguments ) // Caller, applied when $function isn't defined function __call($function, $arguments, &$result) { global $THIS_SAFESTRING; // Constructor called in PHP version < 5 if ($function != __CLASS__) { foreach ($arguments as $key => $val) { ## WAS &$val; only in PHP5! if (is_string($val)) { if (strcmp($val,$THIS_SAFESTRING)==0) { $arguments[$key] = $this->toUnsafeRawString(); } } else { if (is_SafeString($val)) { $arguments[$key] = $arguments[$key]->toUnsafeRawString(); } } } $r = call_user_func_array($function,$arguments); if (is_string($r)) { $result=new SafeString($r); } else { $result = $r; } } if (phpversion() < 5) return true; } }
// Call the overload() function when appropriate if (function_exists("overload") && phpversion() < 5) { overload("SafeString"); }
function cleanKey($key) { # by convention we now assume, that all array keys (that come from user-input) # may only contain letters and numbers # otherwise, people might do sneaky things like: # /url?www[][<SCRIPT>]=42 return preg_replace("/[^\w\d]+/i","",$key); }
function convertToSaveString($var,$depth=0) { if (is_array($var)) { foreach ($var as $key => $value) { $key2=cleanKey($key); if (strcmp($key2,$key)!=0) { $var[$key2] = convertToSaveString($var[$key], $depth+1); unset($var[$key]); } else { $var[$key] = convertToSaveString($var[$key], $depth+1); } } return $var; } else { if (is_string($var)) { return new SafeString(strval($var)); } else { echo "SafeString Warning: NOT A STRING: $var<BR>\n"; return new SafeString(strval($var)); } } }
#echo "_REQUEST BEFORE: <PRE>".print_r($_REQUEST,true)."</PRE><BR>\n\n"; #$_REQUEST = convertToSaveString($_REQUEST); #echo "_REQUEST AFTER: <PRE>".print_r($_REQUEST,true)."</PRE><BR>\n\n";
?>
|
| Download |
I haven't tested it on PHP5 yet, but this works on PHP4. It's a bit
painfull to try and make it work on both PHP versions. It should work... :-)
So, is this for real and will it be used? Not sure about that.
Probably it would require "fixing" various database abstraction
layers and template-engines to get this all into some usable state.
Until then this code is more a request for comment :-)
|