Package openid :: Package consumer :: Module html_parse
[hide private]
[frames] | no frames]

Module html_parse

source code

This module implements a VERY limited parser that finds <link> tags in the head of HTML or XHTML documents and parses out their attributes according to the OpenID spec. It is a liberal parser, but it requires these things from the data in order to work:

From http://openid.net/specs.bml#linkrel:

The parser ignores SGML comments and <![CDATA[blocks]]>. Both kinds of quoting are allowed for attributes.

The parser deals with invalid markup in these ways:

Functions [hide private]
 
tagMatcher(tag_name, *close_tags) source code
 
replaceEnt(mo)
Replace the entities that are specified by OpenID
source code
[[(type(html), type(html))]]
parseLinkAttrs(html)
Find all link tags in a string representing a HTML document and return a list of their attributes.
source code
 
relMatches(rel_attr, target_rel)
Does this target_rel appear in the rel_str?
source code
 
linkHasRel(link_attrs, target_rel)
Does this link have target_rel as a relationship?
source code
 
findLinksRel(link_attrs_list, target_rel)
Filter the list of link attributes on whether it has target_rel as a relationship.
source code
 
findFirstHref(link_attrs_list, target_rel)
Return the value of the href attribute for the first link tag in the list that has target_rel as a relationship.
source code
Variables [hide private]
  flags = 114
  removed_re = re.compile(r'(?isux)<!--.*?-->|!\[CDATA\[.*?\]\]>...
  tag_expr = '\n# Starts with the tag name at a word boundary, w...
  html_find = re.compile(r'(?isux)<html\b(?!:)(?P<attrs>[^>]*?)(...
  head_find = re.compile(r'(?isux)<head\b(?!:)(?P<attrs>[^>]*?)(...
  link_find = re.compile(r'(?isux)<link\b(?!:)')
  attr_find = re.compile(r'(?isux)(?P<attr_name>\w+)=(?:(?P<qope...
  replacements = {'amp': '&', 'gt': '>', 'lt': '<', 'quot': '"'}
  ent_replace = re.compile(r'&(amp|lt|gt|quot);')
  __package__ = 'openid.consumer'
Function Details [hide private]

parseLinkAttrs(html)

source code 

Find all link tags in a string representing a HTML document and return a list of their attributes.

Parameters:
  • html (str or unicode) - the text to parse
Returns: [[(type(html), type(html))]]
A list of dictionaries of attributes, one for each link tag

Variables Details [hide private]

removed_re

Value:
re.compile(r'(?isux)<!--.*?-->|!\[CDATA\[.*?\]\]>|script\b(?!:)[^>]*>.\
*?</script>')

tag_expr

Value:
'''
# Starts with the tag name at a word boundary, where the tag name is
# not a namespace
<%(tag_name)s\\b(?!:)

# All of the stuff up to a ">", hopefully attributes.
(?P<attrs>[^>]*?)

...

html_find

Value:
re.compile(r'(?isux)<html\b(?!:)(?P<attrs>[^>]*?)(?:/>|>(?P<contents>.\
*?)(?:</?html\s*>|\Z))')

head_find

Value:
re.compile(r'(?isux)<head\b(?!:)(?P<attrs>[^>]*?)(?:/>|>(?P<contents>.\
*?)(?:</?(?:head|body)\s*>|\Z))')

attr_find

Value:
re.compile(r'(?isux)(?P<attr_name>\w+)=(?:(?P<qopen>["\'])(?P<q_val>.*\
?)\2|(?P<unq_val>(?:[^\s<>/]|/(?!>))+))|(?P<end_link>[<>])')