How does a parser (for example, HTML) work?

For argument's sake lets assume a HTML parser. I've read that it *tokenizes* everything first, and then parses it. What does tokenize mean? Does the parser read every character each, building up a multi dimensional array to store the structure? For example, does it read a `<` and then begin to capture the element, and then once it meets a closing `>` (outside of an attribute) it is pushed onto a array stack somewhere? I'm interested for the sake of knowing (I'm curious). If I were to read through the source of something like [HTML Purifier][1], would that give me a good idea of how HTML is parsed? [1]:
Look at for a very brief intro; also check out the Parsing article there. And HTML Purifier, at some point, does exactly that.

以上就是How does a parser (for example, HTML) work?的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » HTML5 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准