How does a parser (for example, HTML) work?

For argument's sake lets assume a HTML parser. I've read that it *tokenizes* everything first, and then parses it. What does tokenize mean? Does the parser read every character each, building up a multi dimensional array to store the structure? For example, does it read a `<` and then begin to capture the element, and then once it meets a closing `>` (outside of an attribute) it is pushed onto a array stack somewhere? I'm interested for the sake of knowing (I'm curious). If I were to read through the source of something like [HTML Purifier][1], would that give me a good idea of how HTML is parsed? [1]: http://htmlpurifier.org/
Look at en.wikipedia.org/wiki/Lexical_parser for a very brief intro; also check out the Parsing article there. And HTML Purifier, at some point, does exactly that.

以上就是How does a parser (for example, HTML) work?的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » HTML5 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准

联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏