Can anyone give me an example on how to use to parse out all the `A` tag href's from an html document? (either C++ code or python code is ok, but I would prefer an example using the python bindings)
I can see how it works in the python tests, but they expect special tokens already in the html at which points it checks state values. I don't see how to get the proper callbacks during state changes when feeding the parser plain html.
I can get some of the information I am looking for with the following code, but I need to feed it blocks of html not just characters at a time, and i need to know when it's finished with a tag,attribute, etc not just if it's in a tag, attribute, or value.
parser = py_streamhtmlparser.HtmlParser()
html = """link"""
for index, character in enumerate(html):
print index, character, parser.Tag(), parser.Attribute(), parser.Value(), parser.ValueIndex()
you can see a sample run of this code [here](http://pastebin.com/fdc63fda)
以上就是example for using streamhtmlparser的详细内容，更多请关注web前端其它相关文章！