Paul Talaga

Fuzzpault Technologies LLC


CHXHtml

Note:CHXHtml 0.2.0 WILL compile in GHC 7.0, though it will take some time (about an hour for me). GHC 6.10 compiles it much faster, though the code is slower. Fastest compilation occurs with version 0.1.4 and GHC 6.10.3, though you won't have HTML 4.01 support.

CHXHtml (Compliant Haskell XHTML) is a Haskell library for static or dynamic (X)HTML production. What sets it apart from all other production libraries:

Requirements & Download

Currently tested on GHC 6.10.3 & 7.0.3 w/ CHXhtml 0.2.0

Requirements:

Exported Modules:

Hackage: 0.2.0,0.1.4, 0.1.3, 0.1.2, 0.1.1, 0.1.0


Cabal Install from Hackage

$ cabal install CHXHtml

Documentation

Generalized Haddock documentation: 0.1.1

Syntax

Simple html-like syntax. If you know even a little Haskell, the syntax is easy to pick up, no fancy combinators or monads, just nested data structures. For example...

import Text.CHXHtml.XHtml1_strict

helloWorld = 
    _html [
        _head [
            _title [pcdata "Hello World Page"]
        ],
        _body [_h1 [pcdata "Hello World"],
            _hr,
            _div [a_ [href_att "http://google.com"] [pcdata "Search!"]]
        ]
    ]

creates the structure. To render it to a string use the render function (putStrLn used to show newline)...

ghci>putStrLn (render helloWorld)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html  xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content Type" content="text/html;charset=UTF-8" />
<title>Hello World Page</title>
</head>
<body><h1>Hello World</h1>
<div><a href="http://google.com">Search!</a>
</div>
</body>
</html>

Each tag is represented by a function (2 actually) which produce the appropriate constructor for that context. The _<tag> function does not allow attributes, while the <tag>_ does. Non-attribute versions accepts a list of children tags, which may have children themselves as long as the DTD allows it. Attribute versions accept a list of attributes and values before the list of children.

In general there are nine functions, with type signatures:

_<tag> :: [Ent] -> Ent
<tag>_ :: [Attribute] [Ent] -> Ent
render :: Ent -> String
render_bs :: Ent -> Data.ByteString
pcdata :: String -> Ent
pcdata_bs :: Data.ByteString -> Ent
<attribute>_att :: String -> Attribute
<attribute>_att_bs :: Data.ByteString -> Ent
htmlHelp :: [String] -> [[String]]

Where <tag> is any tag, and <attribute> is any attribute name. BUT, only when the DTD allows it! Yes, Ent and Attribute are gross simplifications, but you get the idea. Nasty compilation type errors will occur if you attempt a non-valid structure. pcdata provides a way to get regular text in, which is HTML escaped for safety. Use pcdata_bs for Javascript, it isn't escaped. Again, only when allowed.

Help! What tags are valid?

So you may be wondering if you have to constantly look at the DTD to find what tags and attributes are allowed. Fear no more, htmlHelp to the rescue! Provide a list of parent tags ([String]), and htmlHelp gives a list of allowed children and attributes, or an error. For example had I wanted to know what children or attributes the inner div takes in the example above I would run the following in the interpreter after loading the library:

*Main> htmlHelp ["html","body","div"]               
[["a","abbr","acronym","address","b","bdo","big","blockquote","br","button","cite","code","del","dfn",
"div","dl","em","fieldset","form","h1","h2","h3","h4","h5","h6","hr","i","img","input","ins","kbd",
"label","map","noscript","object","ol","p","pcdata","pre","q","samp","script","select","small","span",
"strong","sub","sup","table","textarea","tt","ul","var"],
["class_att","dir_att","id_att","lang_att",
"onclick_att","ondblclick_att","onkeydown_att","onkeypress_att","onkeyup_att","onmousedown_att",
"onmousemove_att","onmouseout_att","onmouseover_att","onmouseup_att","style_att","title_att"]]
    

So this nesting is valid, and I get possible childrent tags and node attributes. Note that all queries should start with html.

To play with htmlHelp without touching Haskell, visit htmlHelp for an online interface to the function, which is running the actual htmlHelp function!

Speed and Efficiency

Strings are painfully slow in Haskell due to their linked-list nature. We use Data.ByteString.Lazy internally to efficiently store and manipulate character data. Thus, we expose this data via the _bs suffix to our input and output functions. As of right now pcdata_bs does NOT HTML escape the character data unlike the pcdata version. We trust you to respect this feature.

Memoization tricks to come......