GeSHi Filter, Filtered HTML и xml

Главные вкладки

Victor 21 апреля 2007 в 0:15

Пытаюсь настроить подсветку синтаксиса на сайте. Столкнулся со следующей проблемой.

Не получается вывести хml-контент.

Если в настройках Filtered HTML включить "Удалять запрещенные теги" то до GeSHi Filter не доходит вообще никакого контента - html-фильтер его глушит, так как все теги в xml он считает запрещеными (а разрешить их ему нельзя, т.к. в общем случае они могут быть любые).

Если же в настройках Filtered HTML включить "Оставить все теги" то до GeSHi Filter приходит обработаный xml-контент после функции check_plain(), т.е. все специальные символы заменены на соответствующие html-значения и GeSHi Filter снова курит

Если GeSHi Filter ставить до Filtered HTML, то тоже ничего хорошего не получается.

Может кто-нибудь знает способ заставить Filtered HTML удалять любые запрещенные теги из контента, кроме контента заключенного в тег blockcode (со всем контентом в этом теге хтмл-фильтер не должен делать вообще ничего, там разбереться GeSHi Filter).

Или может есть еще какое-нибудь решение кроме GeSHi Filter для подсветки синтаксиса, в котором эта проблема уже решена.

<?php
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
// Store the input format
_filter_xss_split($allowed_tags, TRUE);
// Remove NUL characters (ignored by some browsers)
$string = str_replace(chr(0), '', $string);
// Remove Netscape 4 JS entities
$string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);

// Defuse all HTML entities
$string = str_replace('&', '&', $string);
// Change back only well-formed entities in our whitelist
// Named entities
$string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);
// Decimal numeric entities
$string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
// Hexadecimal numeric entities
$string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);

return preg_replace_callback('%
(
<(?=[^a-zA-Z!/]) # a lone <
| # or
<[^>]*.(>|$) # a string that starts with a <, up until the > or the end of the string
| # or
> # just a >
)%x', '_filter_xss_split', $string);
}
?>

---------------------------------

вот новый вариант:

<?php
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
// Store the input format
_filter_xss_split($allowed_tags, TRUE);

$matches = preg_split('%(<\s*blockcode[^>]*>?)|(<\s*/?\s*blockcode\s*>?)%', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$ret_string = '';
for ($i = 0; $i <= sizeof($matches); $i++) {
$word1 = _filter_xss_split(array('', $matches[$i]));
if ( strstr($word1, '') ) {
$ret_string .= $word1 . $word2 . $word3;
$i += 2;
} else {
$ret_string .= $word1 . $word2 . '';
$i += 1;
}
} else {
$word = $matches[$i];
// Remove NUL characters (ignored by some browsers)
$word = str_replace(chr(0), '', $word);
// Remove Netscape 4 JS entities
$word = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $word);

// Defuse all HTML entities
$word = str_replace('&', '&', $word);
// Change back only well-formed entities in our whitelist
// Named entities
$word = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $word);
// Decimal numeric entities
$word = preg_replace('/&#([0-9]+;)/', '&#\1', $word);
// Hexadecimal numeric entities
$word = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $word);

$ret_string .= preg_replace_callback('%
(
<(?=[^a-zA-Z!/]) # a lone <
| # or
<[^>]*.(>|$) # a string that starts with a <, up until the > or the end of the string
| # or
> # just a >
)%x', '_filter_xss_split', $word);
}
}

return $ret_string;
}
?>

---------------------------

Просьба высказывать мнения, может кто увидит как это можно сделать более элегантно

Drupal Drupal на русском

GeSHi Filter, Filtered HTML и xml

Главные вкладки

Комментарии