/* htmLawed_TESTCASE.txt, 27 February 2016 htmLawed 1.1.22, 5 March 2016 Copyright Santosh Patnaik Dual licensed with LGPL 3 and GPL 2+ A PHP Labware internal utility - http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed */ This file has UTF-8-encoded text with both correct and incorrect/malformed HTML/XHTML code snippets to test htmLawed (test cases/samples). The entire text may also be used as a unit. ************************************************ when viewing this file in a web browser, set the character encoding to Unicode/UTF-8 ************************************************ --------------------- start -------------------- Try different $config and $spec values. Some text even when filtered in will not be displayed in a rendered web-page
Attributes
Xml:lang:, ,
Standard, predefined value, or empty attribute: , ,
Required: image, image
Quote & space variation: a, a, a
Invalid: a
Duplicated: a
Deprecated: a,

Casing:
Custom: image
Data-*: a
Admin-restricted?:
Attribute values
Duplicate ID value:, ,
(try 'my_' for prefix)
Double-quotes in value:, ,
(try filter for CSS expression)
CSS expression:

Other: ,
(try 'maxlen', 'maxval', etc., for 'input' in '$spec')
Blockquotes
abc

abc
def

abc
def

abc
def
ghi

abc
def
ghi
QQQ
x
<!-- comment -->

x
<!-- comment -->QQQ

<!-- comment -->
x
QQQ
x

x<!-- comment -->
QQQ

x



(try with blockquote parent)
CDATA sections
Special characters inside: <![CDATA[ ]]> ]]>, <![CDATA[ 3 < 4 > 3.5, & 4 > 4 ]]>
Normal: <![CDATA[ check ]]>, CDATA follows:<![CDATA[ check ]]>
Malformed: <![cdata check ]]>, < ![CDATA check ]]>, <![CDATA check ]]>, < ![CDATA check ] ]>
Invalid: >CDATA in tag content, <![CDATA[ check ]]>
text not allowed
Complex-1: deprecated elements
The PHP software script used for this web-page webpage is htmLawedTest.php, from PHP Labware.
Complex-2: deprecated attributes
aa

image

Section

Para

  1. First item
  1. First item

Complex-3: embed, object, area
<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/ls7gi1VwdIQ" /></param><embed src="http://www.youtube.com/v/ls7gi1VwdIQ" type="application/x-shockwave-flash" width="425" height="350"></embed></object>
<embed src="http://www.youtube.com/v/ls7gi1VwdIQ" type="application/x-shockwave-flash" width="425" height="350"></embed>
<object data="1.gif" type="image/gif" usemap="#map1">

navigate the site: 1 | 3 | 4

area
</object> <param name="name" />value</param> <object id="obj1"> <param name="param1" /> <object id="obj2"> <param name="param2" /> </object> </object>
Complex-4: nested and other tables
Cell
Cell
Cell
Cell Cell Cell
Cell
Cell Cell Cell

PCDATA wrong: Well
Hello

Missing tr: <td>Well</td>

Complex-5: pseudo, disallowed or non-HTML tags
(Try different 'keep_bad' values) <*> Pseudotags <*> <xml>Non-HTML tag xml</xml>

Disallowed tag p

Elements
Unbalanced: check</em>
Non-XHTML:

Malformed: < a href=""></a>, , , , < /a>, < a href="">, a, a, <imgsrc="s" alt="a" />
Invalid: <image src="s" alt="a" />
Empty: a, a</img>, atext</img>
Content invalid: 12</a>
Content invalid?:

(try setting 'form' as parent)
Casing:
Check for tidy:

</div>
</div>
</div>
hi
Entities
Special: & 3 < 2 & 5>4 and j >i >a & i<j>a
Padding: B B f f &#x003; &#0003;
Malformed: & #x27;, &x27;, ' &TILDE;, &tilde
Invalid: &#x3;, &#55296;, &#03;, &#1114112;, &#xffff, &bad;
Discouraged characters: &#x7f;, &#132;, ﷠, 􏿾
Context: '>', <?
Casing: ', ', &TILDE;, ˜
(also check named-to-numeric and hexdec-to-decimal, and vice versa, conversions)
Format
Valid but ill-formatted: text <!-- comment --> text <!-- A c o m m e n t --> <script> <![CDATA[ code ]]> </script><!-- comment --><![CDATA[ cdata ]]> text</b> text<pre id="none">p r e</pre> text text
<hr /> text none text text none t e x t
text none t e x t text none t e x t <script>script</script>
p r e <!-- comment --> 
				pre
		
Cell
Cell
Cell
CellCellCell
Cell
CellCellCell
(try to compact or beautify)
Forms
(note nesting of 'form', missing required attributes, etc.)
<script type="text/javascript">s</script>
pl
h
</form>


B:C:

(try each of these lines separately)
what
what (try with container as div and as form)
c a b<script>s</script>
HTML comments (also CDATA)
Script inside: <!--[if gte IE 4]> <SCRIPT>alert('XSS');</SCRIPT> <![endif]-->
Special characters inside: <!-- <![CDATA check ]]> -->, <!-- 3 < 4 > 3.5, & 4 > 4 -->, <!-- che--ck -->, <!--[if !IE]> <-->c<!--> <![endif]-->
Normal: <!-- check -->, <!--check -->, comment:<!-- check --><!-- check -->, <table><!-- check --><tr><td>text not allowed</td></tr></table>
Malformed: <![cdata check ]]>, < ![CDATA check ]]>, < ![CDATA check ] ]>
Invalid:
>comment in tag content, <!--check-->
HTML5
figure and figcaption: <figure>picture<figcaption>Caption for the awesome picture</figcaption></figure> article:

A

B

<article>

C

</article><article>

E

F

G

</article> meter:

Heat <meter min="100" max="200" value="150">150</meter>.

datalist: <datalist id="b"><option value="c"><option value="d"></datalist>
Ins-Del
(depending on context, these elements can be of either block or inline type)

<div>block

</ins></p>

d


d

<div>d

</del></p></ins>
d
Lists
Invalid character data:
Definition list:
a
bad
first one
b
second

Definition list, close-tags omitted:
a
bad
first one
b
second

Definition lists, nested:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Definition lists, nested, close-tags omitted:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Nested:
Nested, directly:
Nested, close-tags omitted:
Complex:
    <script></script>

</li></ul> </td></tr></table></li></ol> Menu:
Microdata
I am X but people call me Y. Find me at www.xy.com
Microsoft Word
Proprietary tag:

<o:p> </o:p>


XML declaration: <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
XML-invalid character code-point (may not replicate):

“Where is he?” asked both Mary – the one so lovely – and Jane.

Nesting
Block or inline a:

text

<div>hi</div>
Non-English text-1
Inscrieţi-vă acum la a Zecea Conferinţă Internaţională
გთხოვთ ახლავე გაიაროთ რეგისტრაცია
večjezično računalništvo
อ.อ่าง
Зарегистрируйтесь сейчас на Десятую Международную Конференцию по
(this file should have utf-8 encoding; some characters may not be displayed because of missing fonts, etc.)
Non-English text-2: entities
用统一码
გთხოვთ
Inscreva-se agora para a Décima Conferência Internacional Sobre O Unicode, realizada entre os dias 10 e 12 de março de 1997 em Mainz na Alemanha.
Ruby
(need compatible browser)
さい とう のぶ W3C Associate Chairman
WWW (World Wide Web)
A (aaa)
Tables
Omitted closing tags:
h1c1h1c2
r1c1r1c2
r2c1r2c2

Nested, omitted closing tags:
h1c1h1c2
r1c1r1c2
h1c1h1c2
r1c1r1c2
r2c1r2c2
r2c1r2c2

Tag transformation
Font element intended as 'inline' element:

hi


Font element intended as 'block' element:
<div>hi
</span></div>
Font element intended as 'block' element:
<div>hi
QQQ
</span></div>
Tidy
White-space handling: abc def ghi abc def ghi
URLs
Relative and absolute: , , , , , ,
(try base URL value of 'http://a.com/b/')
CSS URLs:
,
,
,
,

Double URLs: b
Anti-spam: (try regex for 'http://a.com', etc.) , , , , , , ,
Soft-hyphen: ídis­c
XSS
<img onmouseover=confirm(1)// '';!--"<xss>=&{()}
image
image
image
image
test <div style="javascript:alert('xss');"></div>
<div style="background-image:url(denied:javascript:alert('xss'));"></div>
<div style="background-image:url("denied:javascript:alert('xss')" );"></div>
<!--[if gte IE 4]><script>alert('xss');</script><![endif]-->
<script a=">" src="http://ha.ckers.org/xss.js"></script>
<div style="background-image: url('denied:js:xss')"></div>
test
Bad IE7: x
Opera: link Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: x
Bad IE7: x
Bad IE7: x
Bad IE7: x
Bad IE7: exp/*x
Bad IE7: hi
Bad IE7: hi
Bad IE7: test
Bad IE7: hi
Bad IE7: hi
<h6>Other</h6> 3 < 4
3 > 4
> 3
<._.> hi!
<<< ALERT >>>
<![if !vml]> some stuff <![endif]>
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
<uml:ns ns = "urn:www">
<uml:ns ns = 'urn:www'>
if(13<age AND 21>age){say 'teen'}
age >51 and a smoking history of >51 pack-years was
age > 51 and a smoking history of >51 pack-years was
age <51 and a smoking history of <51 pack-years <b>was</b>
age < 51 and a smoking history of < 51 pack-years was
age >51 and a smoking history of >51 pack-years
age > 51 and a smoking history of >51 pack-years
age <51 and a smoking history of <51 pack-years</b>
age < 51 and a smoking history of < 51 pack-years