May 23, 2006

Rescue terrible HTML with TagSoup XHTML

Anonymous Reader writes "The problem is that the Web is still mostly populated by the scary legacy of poorly structured HTML, much of it not even compliant to the more lenient SGML standard. XHTML is a friendly enough format for parsing and screen-scraping, but the Web still has a lot of messy HTML out there. In this tip Uche Ogbuji demonstrates the use of TagSoup to turn just about any HTML into neat XHTML."



  • Web Development
