Let's say I allow users to enter raw HTML in a web page. Let's say users are stupid and don't know how to write validate HTML. How can I clean up and simplify their code?
is simplified in
is cleaned up in
<b>lorem <i>ipsum</b> dolor</i>
(or something similar).
<b>lorem <i>ipsum</i></b> dolor
Is there any Ext JS function or plugin to do that? Or any external JS library?
I've been trying to make my own algorithm but it's not really trivial...
Please give your threads meaningful titles. 'mp' doesn't give much of a hint what the thread is about.
What about just setting it as the innerHTML of a DOM node then reading it back out again?
Sorry... I put a good title, but something has probably happend with my keyboard then... ^^ (and now I can't edit the post title).
I seems to work to clean up the code. At least, when I'm typing <b>lorem <i>ipsum</b> dolor</i>, my Chrome inspector elements creates a good HTML tree. So I guess I could read it from the DOM and get the tags in the right order.
But I also (and mostly) want to simplify the code. Stuff like <b><b>OK</b></b> shouldn't exist...
Technically there's nothing wrong with nested <b> tags. Given suitable CSS the inner tag could easily be styled differently from the outer tag. While I understand where you're coming from, the requirement not to have nested <b> tags isn't really part of HTML, that's a requirement you've layered on top. As such I suspect you'll struggle to find a library to do it.
My first thought for how I'd implement this is also to use DOM nodes. That avoids any issues with invalid HTML and you can then navigate the tree and make your modifications before reading back the contents.
Given your markup rules don't appear to match the rules of HTML you might want to consider using an alternative markup format that can be converted to HTML, like the one used by Wikis. Conventions like using stars to surround bold text avoid the nesting issue as the opening and closing tag are identical.