Internet marketing resources, ecommerce web site design tutorials and  just for fun - free cell phone ringtones!
  Taming the Beast - quality web marketing and ecommerce development services

Word/Excel HTML code bloat

Posted by Michael Bloch in web development (Tuesday August 28, 2007 )

Ever copied and pasted a Word or Excel document into a HTML page and then viewed the source code? It gives a whole new meaning to the term code bloat :)

I needed to copy an Excel spreadsheet consisting of around 100 rows and 10 columns into a HTML page the other day and was quite shocked at the amount of unnecessary code that resulted. There were extra unnecessary tags, attributes and values for every single table row and cell. To manually remove it all would have taken ages, so I went hunting for something to automate the task.

I didn’t spend a great deal of time searching, but I came across this great little app from JafSoft called (appropriately) Detagger. It was very easy to use and cleaned the table up beautifully – reduced it to just a bare bones table which is exactly what I wanted. The document prior to using Detagger weighed in at around 80kb. After running Detagger, it was a much leaner 30kb.

Detagger can also convert HTML to plain text with headings, lists and tables all left intact. You can also “flatten” the output text, ready for input into a database, or to convert tables into CSV format. It’s not freeware, but you can try it out for 30 days free of charge. The price to register the software is $29.95(US) for a single user license; which is very reasonable if you’re doing a lot of this sort of work.

Know of any other good software applications for stripping MS Office bloat from HTML documents? Please add your recommendations below!


Comments for Word/Excel HTML code bloat

No comments yet.

Sorry, the comment form is closed at this time.