module Xml_lexer:sig
..end
ocamllex
lexer for XML files. It only
supports the most basic features of the XML specification.
The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration.
The predefined entities (&
, <
, etc.) are supported. The
replacement text for other entities whose entity value consist of
character data can be provided to the lexer (see
Xml_lexer.entities
). Internal entities declarations are not
taken into account (the lexer just skips the doctype declaration).
CDATA
sections and character references are supported.
See Xml_lexer.strip_ws
about whitespace handling.
This module provides an ocamllex
lexer for XML files. It only
supports the most basic features of the XML specification.
The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration.
The predefined entities (&
, <
, etc.) are supported. The
replacement text for other entities whose entity value consist of
character data can be provided to the lexer (see
Xml_lexer.entities
). Internal entities declarations are not
taken into account (the lexer just skips the doctype declaration).
CDATA
sections and character references are supported.
See Xml_lexer.strip_ws
about whitespace handling.
type
error =
| |
Illegal_character of |
| |
Bad_entity of |
| |
Unterminated of |
| |
Tag_expected |
| |
Attribute_expected |
| |
Other of |
val error_string : error -> string
exception Error of error * int
int
argument indicates the character position in
the buffer. Note that some non-conforming XML documents might not
trigger an error.type
token =
| |
Tag of |
(* | Tag (name, attributes, empty) denotes an opening tag
with the specified name and attributes . If empty ,
then the tag ended in "/>", meaning that it has no
sub-elements. | *) |
| |
Chars of |
(* |
Some text between the tags
| *) |
| |
Endtag of |
(* |
A closing tag
| *) |
| |
EOF |
(* |
End of input
| *) |
val strip_ws : bool Pervasives.ref
strip_ws
is true
(the default),
whitespaces next to a tag are ignored. Character data consisting
only of whitespaces is thus suppressed (i.e. Chars ""
tokens are
skipped).val entities : (string * string) list Pervasives.ref
["amp", "&"; "lt", "<" ...]
).val token : Lexing.lexbuf -> token
Error
in case of an invalid XML document