Zend PHP 7 Certification – Strings – Encoding

This post covers the Encoding section of the Strings chapter when studying for the Zend PHP 7 Certification.

There are a variety of PHP extensions that support character encoding. The Multibyte String extension provides string functions that help you deal with multibyte encodings. mbstring is a non-default extension. This means it is not enabled by default. You must explicitly enable the module with the configure option, passing in --enable-mbstring as a configuration option.

The string functions that mbstring provides can be seen below.

The mb_check_encoding() function checks if the string is valid for the specified encoding. It can take two parameters.

  • var – The byte stream to check. If it is omitted, this function checks all the input from the beginning of the request.
  • encoding – The expected encoding.
mb_check_encoding($string, 'UTF-8');

The mb_convert_encoding() function converts character encoding. It can take three parameters.

  • str – The string being encoded.
  • to_encoding – The type of encoding that str is being converted to.
  • from_encoding – Is specified by character code names before conversion. It is either an array, or a comma separated enumerated list. If from_encoding is not specified, the internal encoding will be used.
/* Convert internal character encoding to SJIS */
$str = mb_convert_encoding($str, "SJIS");

/* Convert EUC-JP to UTF-7 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");

The mb_detect_encoding() function detects character encoding. This function can also take three parameters.

  • str – The string being detected.
  • encoding_list – A list of character encoding. Encoding order may be specified by array or comma separated list string. If this is omitted, mb_detect_order is used.
  • strict – strict specifies whether to use the strict encoding detection or not. Default is false.

Note that if you try to use mb_detect_encoding() to detect whether a string is valid UTF-8, use the strict mode, it is pretty worthless otherwise.

$str = 'áéóú'; // ISO-8859-1
mb_detect_encoding($str, 'UTF-8'); // 'UTF-8'
mb_detect_encoding($str, 'UTF-8', true); // false

mb_detect_order() sets or gets the character encoding detection order. It takes one parameter which is the encoding_list – an array or comma separated list of character encoding. If encoding_list is omitted, it returns the current character encoding detection order as array.

/* Set detection order by enumerated list */
mb_detect_order("eucjp-win,sjis-win,UTF-8");

/* Set detection order by array */
$ary[] = "ASCII";
$ary[] = "JIS";
$ary[] = "EUC-JP";
mb_detect_order($ary);

/* Display current detection order */
echo implode(", ", mb_detect_order());

View the other sections:

Note: This article is based on PHP version 7.1.