/* htmLawed_TESTCASE.txt, 11 February 2017 To test htmLawed Copyright Santosh Patnaik Dual licensed with LGPL 3 and GPL 2+ A PHP Labware internal utility - www.bioinformatics.org/phplabware/internal_utilities/htmLawed */ This file has UTF-8-encoded text with both correct and incorrect/malformed HTML/XHTML code snippets to test htmLawed (test cases/samples). The entire text may also be used as a unit. ************************************************ when viewing this file in a web browser, set the character encoding to Unicode/UTF-8 ************************************************ --------------------- start -------------------- Try different $config and $spec values. Some text even when filtered in will not be displayed in a rendered web-page
Attributes
Xml:lang:, ,
Standard, predefined value, or empty attribute: , ,
Required: image, image
Quote & space variation: a, a, a
Invalid: a
Duplicated: a
Deprecated: a,

Casing:
Custom: image
Data-*: a
Admin-restricted?:
Attribute values
Duplicate ID value:, ,
(try 'my_' for prefix)
Double-quotes in value:, ,
(try filter for CSS expression)
CSS expression:

Other: ,
(try 'maxlen', 'maxval', etc., for 'input' in '$spec')
Blockquotes
abc

abc
def

abc
def

abc
def
ghi

abc
def
ghi
QQQ
x
<!-- comment -->

x
<!-- comment -->QQQ

<!-- comment -->
x
QQQ
x

x<!-- comment -->
QQQ

x



(try with blockquote parent)
CDATA sections
Special characters inside: <![CDATA[ ]]> ]]>, <![CDATA[ 3 < 4 > 3.5, & 4 > 4 ]]>
Normal: <![CDATA[ check ]]>, CDATA follows:<![CDATA[ check ]]>
Malformed: <![cdata check ]]>, < ![CDATA check ]]>, <![CDATA check ]]>, < ![CDATA check ] ]>
Invalid: >CDATA in tag content, <![CDATA[ check ]]>
text not allowed
Complex-1: deprecated elements
The PHP software script used for this web-page webpage is htmLawedTest.php, from PHP Labware.
Complex-2: deprecated attributes
aa

image

Section

Para

  1. First item
  1. First item

Complex-3: embed, object, area
<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/ls7gi1VwdIQ" /></param><embed src="http://www.youtube.com/v/ls7gi1VwdIQ" type="application/x-shockwave-flash" width="425" height="350"></embed></object>
<embed src="http://www.youtube.com/v/ls7gi1VwdIQ" type="application/x-shockwave-flash" width="425" height="350"></embed>
<object data="1.gif" type="image/gif" usemap="#map1">

navigate the site: 1 | 3 | 4

area
</object> <param name="name" />value</param> <object id="obj1"> <param name="param1" /> <object id="obj2"> <param name="param2" /> </object> </object>
Complex-4: nested and other tables
Cell
Cell
Cell
Cell Cell Cell
Cell
Cell Cell Cell

PCDATA wrong: Well
Hello

Missing tr: <td>Well</td>

Complex-5: pseudo, disallowed or non-HTML tags
(Try different 'keep_bad' values) <*> Pseudotags <*> <xml>Non-HTML tag xml</xml>

Disallowed tag p

Elements
Unbalanced: check</em>
Non-XHTML:

Malformed: < a href=""></a>, , , , < /a>, < a href="">, a, a, <imgsrc="s" alt="a" />
Invalid: <image src="s" alt="a" />
Empty: a, a</img>, atext</img>
Content invalid: 12</a>
Content invalid?:

(try setting 'form' as parent)
Casing:
Check for tidy:

</div>
</div>
</div>
hi
Entities
Special: & 3 < 2 & 5>4 and j >i >a & i<j>a
Padding: B B f f &#x003; &#0003;
Malformed: & #x27;, &x27;, ' &TILDE;, &tilde
Invalid: &#x3;, &#55296;, &#03;, &#1114112;, &#xffff, &bad;
Discouraged characters: &#x7f;, &#132;, ﷠, 􏿾
Context: '>', <?
Casing: ', ', &TILDE;, ˜
(also check named-to-numeric and hexdec-to-decimal, and vice versa, conversions)
Format
Valid but ill-formatted: text <!-- comment --> text <!-- A c o m m e n t --> <script> <![CDATA[ code ]]> </script><!-- comment --><![CDATA[ cdata ]]> text</b> text
p r e
text text

text none text text none t e x t text none t e x t text none t e x t <script>script</script>
p r e <!-- comment --> 
				pre
		
Cell
Cell
Cell
CellCellCell
Cell
CellCellCell
(try to compact or beautify)
Forms
(note nesting of 'form', missing required attributes, etc.)
<script type="text/javascript">s</script>
pl
h
</form>


B:C:

(try each of these lines separately)
what
what (try with container as div and as form)
c a b<script>s</script>
HTML comments (also CDATA)
Script inside: <!--[if gte IE 4]> <SCRIPT>alert('XSS');</SCRIPT> <![endif]-->
Special characters inside: <!-- <![CDATA check ]]> -->, <!-- 3 < 4 > 3.5, & 4 > 4 -->, <!-- che--ck -->, <!--[if !IE]> <-->c<!--> <![endif]-->
Normal: <!-- check -->, <!--check -->, comment:<!-- check --><!-- check -->, <table><!-- check --><tr><td>text not allowed</td></tr></table>
Malformed: <![cdata check ]]>, < ![CDATA check ]]>, < ![CDATA check ] ]>
Invalid:
>comment in tag content, <!--check-->
HTML5
figure and figcaption:
picture
Caption for the awesome picture
article:

A

B

C

E

F

G

meter:

Heat 150.

datalist:
Ins-Del
(depending on context, these elements can be of either block or inline type)

<div>block

</ins></p>

d


d

<div>d

</del></p></ins>
d
Lists
Invalid character data:
Definition list:
a
bad
first one
b
second

Definition list, close-tags omitted:
a
bad
first one
b
second

Definition lists, nested:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Definition lists, nested, close-tags omitted:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Nested:
Nested, directly:
Nested, close-tags omitted:
Complex:
    <script></script>

</li></ul> </td></tr></table></li></ol> Menu:
  • <button type="button">New...</button>
  • <button type="button">Cut...</button>
  • Microdata
    I am X but people call me Y. Find me at
    Microsoft Word
    Proprietary tag:

    <o:p> </o:p>


    XML declaration: <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
    XML-invalid character code-point (may not replicate):

    “Where is he?” asked both Mary – the one so lovely – and Jane.

    Nesting
    Block or inline a:

    text

    hi

    Non-English text-1
    Inscrieţi-vă acum la a Zecea Conferinţă Internaţională
    გთხოვთ ახლავე გაიაროთ რეგისტრაცია
    večjezično računalništvo
    อ.อ่าง
    Зарегистрируйтесь сейчас на Десятую Международную Конференцию по
    (this file should have utf-8 encoding; some characters may not be displayed because of missing fonts, etc.)
    Non-English text-2: entities
    用统一码
    გთხოვთ
    Inscreva-se agora para a Décima Conferência Internacional Sobre O Unicode, realizada entre os dias 10 e 12 de março de 1997 em Mainz na Alemanha.
    Ruby
    (need compatible browser)
    さい とう のぶ W3C Associate Chairman
    WWW (World Wide Web)
    A (aaa)
    Tables
    Omitted closing tags:
    h1c1h1c2
    r1c1r1c2
    r2c1r2c2

    Nested, omitted closing tags:
    h1c1h1c2
    r1c1r1c2
    h1c1h1c2
    r1c1r1c2
    r2c1r2c2
    r2c1r2c2

    Tag transformation
    Font element intended as 'inline' element:

    hi


    Font element intended as 'block' element:
    <div>hi
    </span></div>
    Font element intended as 'block' element:
    <div>hi
    QQQ
    </span></div>
    Tidy
    White-space handling: abc def ghi abc def ghi
    URLs
    Relative and absolute: , , , , , ,
    (try base URL value of 'http://a.com/b/')
    CSS URLs:
    ,
    ,
    ,
    ,

    Double URLs: b
    Anti-spam: (try regex for 'http://a.com', etc.) , , , , , , ,
    Soft-hyphen: ídis­c
    XSS
    <img onmouseover=confirm(1)// '';!--"<xss>=&{()}
    image
    image
    image
    image
    test <div style="javascript:alert('xss');"></div>
    <div style="background-image:url(denied:javascript:alert('xss'));"></div>
    <div style="background-image:url("denied:javascript:alert('xss')" );"></div>
    <!--[if gte IE 4]><script>alert('xss');</script><![endif]-->
    <script a=">" src="http://ha.ckers.org/xss.js"></script>
    <div style="background-image: url('denied:js:xss')"></div>
    test
    Bad IE7: x
    Opera: link Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: x
    Bad IE7: x
    Bad IE7: x
    Bad IE7: x
    Bad IE7: exp/*x
    Bad IE7: hi
    Bad IE7: hi
    Bad IE7: test
    Bad IE7: hi
    Bad IE7: hi
    <h6>Other</h6> 3 < 4
    3 > 4
    > 3
    <._.> hi!
    <<< ALERT >>>
    <![if !vml]> some stuff <![endif]>
    <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
    <uml:ns ns = "urn:www">
    <uml:ns ns = 'urn:www'>
    if(13<age AND 21>age){say 'teen'}
    age >51 and a smoking history of >51 pack-years was
    age > 51 and a smoking history of >51 pack-years was
    age <51 and a smoking history of <51 pack-years <b>was</b>
    age < 51 and a smoking history of < 51 pack-years was
    age >51 and a smoking history of >51 pack-years
    age > 51 and a smoking history of >51 pack-years
    age <51 and a smoking history of <51 pack-years</b>
    age < 51 and a smoking history of < 51 pack-years