[-Q@sldZddlZddlZddlZddlmZdgZejdZejdZ ejdZ ejdZ ejd Z ejd Z ejd Zejd Zejd ZejdZejdZejdejZejdejZejd ZejdZGdddeZeZGdddejZdS)zA parser for HTML and XHTML.N)unescape HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]>z--\s*>z(([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\s|/(?!>))*z$([a-zA-Z][^ />]*)(?:\s|/(?!>))*zJ\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*(\'[^\']*\'|"[^"]*"|[^\s"\'=<>`]*))?z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*a <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:\s+ # whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) )? ) )* \s* # trailing whitespace aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) (?:\s*,)* # possibly followed by a comma )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#c@s1eZdZdZdddZddZdS)HTMLParseErrorz&Exception raised for all parse errors.NcCs3|s t||_|d|_|d|_dS)Nr)AssertionErrormsglinenooffset)selfrZpositionr !/usr/lib/python3.4/html/parser.py__init__Us   zHTMLParseError.__init__cCsW|j}|jdk r,|d|j}n|jdk rS|d|jd}n|S)Nz , at line %dz , column %dr)rr r )r resultr r r __str__[s  zHTMLParseError.__str__)NN)__name__ __module__ __qualname____doc__rrr r r r rRs rc@sfeZdZdZd;ZededdZddZd d Zd d Z d dZ dZ ddZ ddZ ddZddZddZdddZddZdd Zd!d"Zd#d$Zd%d&Zd'd(Zd)d*Zd+d,Zd-d.Zd/d0Zd1d2Zd3d4Zd5d6Zd7d8Zd9d:Z dS)'.)_HTMLParser__starttag_text)r r r r get_starttag_textszHTMLParser.get_starttag_textcCs2|j|_tjd|jtj|_dS)Nz )lowerr%recompileIr$)r elemr r r set_cdata_modeszHTMLParser.set_cdata_modecCst|_d|_dS)N)r#r$r%)r r r r clear_cdata_modes zHTMLParser.clear_cdata_modec Cs|j}d}t|}xD||kra|jrq|j rq|jd|}|dkr|sePn|}qn=|jj||}|r|j}n|jrPn|}||kr|jr|j r|jt |||q|j|||n|j ||}||kr)Pn|j }|d|rt j ||re|j|}n|d|r|j|}n|d|r|j|}n|d|r|j|}ng|d|r|jr|j|}q/|j|}n+|d|kr.|jd|d}nP|dkr |sEPn|jr^|jdn|jd |d}|dkr|jd|d}|dkr|d}qn |d7}|jr|j r|jt |||q |j|||n|j ||}q|d |rtj ||}|r|jd d} |j| |j}|d |ds|d}n|j ||}qq^d ||dkr|j|||d |j ||d }nPq|d |rLtj ||}|r|jd} |j| |j}|d |dsi|d}n|j ||}qntj ||}|r|r|j||dkr|jr|jdq|j}||kr|}n|j ||d}nPq^|d|krH|jd |j ||d}q^PqdstdqW|r||kr|j r|jr|j r|jt |||n|j||||j ||}n||d|_dS)Nrr? handle_pirM)r rQr!rDrSr r r rH\s &  zHTMLParser.parse_picCsd|_|j|}|dkr(|S|j}||||_g}|jrltj||d}ntj||d}|std|j}|j dj |_ }x$||kr|jrt j||}nt j||}|sPn|j ddd\} } } | s2d} ns| dddko]| d dkns| dddko| ddknr| dd} n| rt| } n|j| j | f|j}qW|||j} | dkr|j\} }d |jkr^| |jjd } t|j|jjd }n|t|j}|jr|jd |||dd fn|j||||S| jd r|j||n/|j||||jkr|j|n|S)Nrrz#unexpected call to parse_starttag()rrW'"r/> z junk characters in start tag: %rr;r;r;)rrd)r/check_for_whole_start_tagr!rtagfindrDtagfind_tolerantrrMrKr1r"attrfindattrfind_tolerantrappendstripr,countr<rfindr.r@endswithhandle_startendtaghandle_starttagCDATA_CONTENT_ELEMENTSr6)r rQendposr!attrsrDrTtagmZattrnamerestZ attrvaluerMr r r r r rEhs`       00    "zHTMLParser.parse_starttagcCsk|j}|jr'tj||}ntj||}|r[|j}|||d}|dkrs|dS|dkr|jd|r|dS|jd|rd S|jr|j||d|jdn||kr|S|dSn|dkrd S|dkrd S|jr@|j|||jd n||krP|S|dSnt d dS)Nrr/z/>rzmalformed empty start tagr z6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzmalformed start tagzwe should not get here!r;r;r;) r!rlocatestarttagendrDlocatestarttagend_tolerantrMrBrAr.r)r rQr!rwrSnextr r r rgs>             z$HTMLParser.check_for_whole_start_tagcCs|j}|||ddks/tdtj||d}|sOd S|j}tj||}|sW|jdk r|j||||S|j r|j d|||fnt j||d}|s|||ddkr|dS|j |Sn|j dj}|jd|j}|j||dS|j dj}|jdk r||jkr|j||||Sn|j|j|j|S) Nrzrr;)r!r endendtagr>rM endtagfindrDr%r@rr.rirZrKr1r= handle_endtagr7)r rQr!rDr[Z namematchZtagnamer5r r r rFs< &  !  zHTMLParser.parse_endtagcCs!|j|||j|dS)N)rrr)r rvrur r r rqszHTMLParser.handle_startendtagcCsdS)Nr )r rvrur r r rrszHTMLParser.handle_starttagcCsdS)Nr )r rvr r r rszHTMLParser.handle_endtagcCsdS)Nr )r rUr r r rLszHTMLParser.handle_charrefcCsdS)Nr )r rUr r r rO szHTMLParser.handle_entityrefcCsdS)Nr )r r)r r r r@szHTMLParser.handle_datacCsdS)Nr )r r)r r r r^szHTMLParser.handle_commentcCsdS)Nr )r Zdeclr r r rYszHTMLParser.handle_declcCsdS)Nr )r r)r r r raszHTMLParser.handle_picCs$|jr |jd|fndS)Nzunknown declaration: %r)rr.)r r)r r r unknown_decls zHTMLParser.unknown_declcCs tjdtddt|S)NzZThe unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.rr)rrrr)r sr r r r"s  zHTMLParser.unescape)rr)!rrrrrsrrrr*r+r.r/r0r6r7r(rIrZrHrErgrFrqrrrrLrOr@r^rYrarrr r r r rfs<        z  < + *          )rr2rr&Zhtmlr__all__r3r#rPrNrJrCr`Z commentcloserhrirjrkVERBOSErzr{r}r~ Exceptionrobjectrr'rr r r r s6