[ Index ]

PHP Cross Reference of MantisBT

title

Body

[close]

/library/utf8/utils/ -> specials.php (summary)

Utilities for processing "special" characters in UTF-8. "Special" largely means anything which would be regarded as a non-word character, like ASCII control characters and punctuation. This has a "Roman" bias - it would be unaware of modern Chinese "punctuation" characters for example. Note: requires utils/unicode.php to be loaded

Version: $Id: specials.php,v 1.2 2006/10/16 21:13:59 harryf Exp $
File Size: 131 lines (7 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 3 functions

  utf8_specials_pattern()
  utf8_is_word_chars()
  utf8_strip_specials()

Functions
Functions that are not part of a class:

utf8_specials_pattern()   X-Ref
Used internally. Builds a PCRE pattern from the $UTF8_SPECIAL_CHARS
array defined in this file
The $UTF8_SPECIAL_CHARS should contain all special characters (non-letter/non-digit)
defined in the various local charsets - it's not a complete list of
non-alphanum characters in UTF-8. It's not perfect but should match most
cases of special chars.
This function adds the control chars 0x00 to 0x19 to the array of
special chars (they are not included in $UTF8_SPECIAL_CHARS)

return: string

utf8_is_word_chars($str)   X-Ref
Checks a string for whether it contains only word characters. This
is logically equivalent to the \w PCRE meta character. Note that
this is not a 100% guarantee that the string only contains alpha /
numeric characters but just that common non-alphanumeric are not
in the string, including ASCII device control characters.

param: string to check
return: boolean TRUE if the string only contains word characters

utf8_strip_specials($string, $repl='')   X-Ref
Removes special characters (nonalphanumeric) from a UTF-8 string

This can be useful as a helper for sanitizing a string for use as
something like a file name or a unique identifier. Be warned though
it does not handle all possible non-alphanumeric characters and is
not intended is some kind of security / injection filter.

author: Andreas Gohr <andi@splitbrain.org>
param: string $string The UTF8 string to strip of special chars
param: string (optional) $repl   Replace special with this string
return: string with common non-alphanumeric characters removed



Generated: Thu Jul 28 15:48:31 2011 Cross-referenced by PHPXref 0.7