T.G. Schramer Consulting (949) 249-1824 lastimo@cox.net
The following files are included in this package:
htb_docs.html - This file. license.txt - License agreement. readme.txt - General info. linux htb - Linux kernel binary. (TBD) sun htb - SunOS binary. windows htb.exe - Win32 binary, Win95, Win98, WinNT, Win2000, WinXP. htb.ico - Windows icon. Use with runhtb.bat or as desired. runhtb.bat - Batch file for WinNT/2000/XP to allow drag and drop processing from Desktop.Back to Table of Contents
HTB is free and may be freely distributed with adherence to the following license agreement:
License Agreement
By downloading this software, you indicate
that you agree with the terms of this agreement.
Important - Please read this agreement carefully.
Copyright:
This software program and any associated material are protected by copyright law. The HTB program is a proprietary product of T.G. Schramer Consulting. T.G. Schramer Consulting retains title to and ownership in the copyright of the HTB software program and the associated materials.
Redistribution and use of the HTB binary and accompanying documentation, with or without modification, are permitted provided that the following conditions are met:
Redistribution of documentation and/or HTB binary program must retain the above copyright notice, this list of conditions and the disclaimer below.
Redistribution must be free of charge whether stand-alone or part of a larger package such as a CD-ROM archive.
Disclaimer:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Simply copy the correct binary file to your system's program directory
(
Back to Table of Contents
To run HTB, type "htb" from a command-line prompt. Running HTB without arguments will display a summary screen of the options. Specifying a single filename containing markup tags will send the re-formatted output to the screen. Specify a second file or redirect "standard output" to save the output to a new file. The original file is never altered. The single character command-line options may be preceded with either "-" or "/" and may be combined into one argument. The order of the options is not important, but only the last value in a set of conflicting options will be effective. The default behavior may not provide the best cleanup, so try different command-line options to get the desired results. Option combinations -as, -ams, -n have given good results also.
One other way to run HTB is with the new
-f option. This forces HTB to run as a "filter"
which reads from the "standard input" stream and sends results to the
"standard output" stream. This allows HTB to be embedded into other
programs and processes.
Back to Table of Contents
Modern browsers are amazingly forgiving of poorly written HTML. HTB will usually report the line number and text of offending tags and ignores them in the re-formatting process. Occasionally with quoting inconsistencies, HTB will stop re-formatting at the point the problem was found. This reporting mechanism makes HTB useful as a simple syntax validator, even if the generated output is not saved. HTB does not attempt to correct invalid files and in most cases still does a good job of re-formatting. If syntax correction is needed, try Tidy from W3C which is a very good cleanup program. Tidy may even be used in combination with HTB using command-line piping and the HTB -f option.
All HTB error messages sent to the "standard error" stream which can be
captured to a file using "standard error" redirection ("2>" or "2>>").
Example 1:
- Beautify myfile.htm and save the output to newfile.htm
but save error messages in another file called error.txt
by redirecting the "standard error" stream to a file.
htb myfile.htm newfile.htm 2> error.txt |
|
htb myfile.htm newfile.htm 2>> error.txt |
tidy myfile.htm 2> error.txt | htb -f > newfile.htm 2>> error.txt |
HTB 2.0 is a major enhancement over 1.0 which has been available for
several years. In addition to options -a, -e, -f, -j, -r, -t, -x, -y, -z,
many bug fixes and enhancements have been added. Among them, are correct
handling of APPLET, OBJECT, SCRIPT & STYLE tags and much better
handling of errors, comments and nested TABLES with their rows and cells.
Many of the new options allow separation of HTML from other data in the
document, like text,
comments, or non-HTML tags.
Back to Table of Contents
In the new hybrid world of Server Pages and XSL, HTB 2.0 was also
expanded to support XML compliant syntax including XHTML & XSL and
be forgiving of custom markup tags often added by HTML extensions and
third party Web applications. XSL beautification is now fully supported
with logical rendering behaviors assigned to every element in the XSL 1.0
specification. Hybrid XHTML/XML documents can still be beautified with
HTML case changes, since the likely case sensitive XML tags are handled
independently of HTML which are not case sensitive. This special XML tag
handling is done automatically whenever an XML compliant file is detected
or may be forced on using the -x option for files
containing "well-formed" XML, but may not strictly adhere to the XML
specification. XML auto detection extends to ASP and JSP files, although
these formats are not strictly XML compliant. The
-y option has been added to switch off special
XML handling and treat all tags the same whether HTML or not.
Back to Table of Contents
<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" botmargin="0" marginwidth="0" marginheight="0" link="#666666" vlink="#666666" alink="#000000"> <table width="800" border="0" cellpadding="0" cellspacing="0"> <tr> <td colspan="2" width="196" bgcolor="cccccc" valign="top"><img src="/images/homepage/rev/logo_06.gif" width="196" height="63"></td> <td bgcolor="cccccc" width="600" valign="top"> <table width="600" border="0" cellpadding="0" cellspacing="0" valign="top"> <tr> <td valign="top" height="17" bgcolor="#CCCCCC"><img src="/images/homepage/rev/comp8_07.gif" width="600" height="17"></td> </tr>After:
<BODY ALINK="#000000" BGCOLOR="#FFFFFF" BOTMARGIN="0" LEFTMARGIN="0" LINK="#666666" MARGINHEIGHT="0" MARGINWIDTH="0" TOPMARGIN="0" VLINK="#666666"> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="800"> <TR> <TD BGCOLOR="cccccc" COLSPAN="2" VALIGN="top" WIDTH="196"><IMG HEIGHT="63" SRC="/images/homepage/rev/logo_06.gif" WIDTH="196"></TD> <TD BGCOLOR="cccccc" VALIGN="top" WIDTH="600"> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" VALIGN="top" WIDTH="600"> <TR> <TD BGCOLOR="#CCCCCC" HEIGHT="17" VALIGN="top"><IMG HEIGHT="17" SRC="/images/homepage/rev/comp8_07.gif" WIDTH="600"></TD> </TR>
The -a command-line option causes all tags containing more that one
attribute to be broken over multiple lines, each with a single
attribute. The attributes are aligned vertically with the first
attribute. A similar attribute break will occur by default, but only
on tags exceeding the column 80 limit, and each line may contain more
than one attribute.
Before:
<BODY BGCOLOR="#FFFFFF" MARGINWIDTH="0" MARGINHEIGHT="0" LINK="#666666" VLINK="#666666" ALINK="#000000"> <TABLE WIDTH="800" BORDER="0" CELLPADDING="0" CELLSPACING="0"> <TR> <TD COLSPAN="2" WIDTH="196" BGCOLOR="cccccc" VALIGN="top"><IMG SRC="/images/homepage/rev/logo_06.gif" WIDTH="196" HEIGHT="63"></TD> <TD BGCOLOR="cccccc" WIDTH="600" VALIGN="top"> <TABLE WIDTH="600" BORDER="0" CELLPADDING="0" CELLSPACING="0" VALIGN="top"> <TR> <TD VALIGN="top" HEIGHT="17" BGCOLOR="#CCCCCC"><IMG SRC="/images/homepage/rev/comp8_07.gif" WIDTH="600" HEIGHT="17"></TD> </TR>After:
<BODY ALINK="#000000" BGCOLOR="#FFFFFF" LINK="#666666" MARGINHEIGHT="0" MARGINWIDTH="0" VLINK="#666666"> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="800"> <TR> <TD BGCOLOR="cccccc" COLSPAN="2" VALIGN="top" WIDTH="196"><IMG HEIGHT="63" SRC="/images/homepage/rev/logo_06.gif" WIDTH="196"></TD> <TD BGCOLOR="cccccc" VALIGN="top" WIDTH="600"> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" VALIGN="top" WIDTH="600"> <TR> <TD BGCOLOR="#CCCCCC" HEIGHT="17" VALIGN="top"><IMG HEIGHT="17" SRC="/images/homepage/rev/comp8_07.gif" WIDTH="600"></TD> </TR>Back to Table of Contents
The -b command-line option causes all tag attributes to be broken
on succeeding lines. The attributes are aligned vertically with the last
character in the tag name.
Before:
<BODY BGCOLOR="#FFFFFF" MARGINWIDTH="0" MARGINHEIGHT="0" LINK="#666666" VLINK="#666666" ALINK="#000000"> <TABLE WIDTH="800" BORDER="0" CELLPADDING="0" CELLSPACING="0"> <TR> <TD COLSPAN="2" WIDTH="196" BGCOLOR="cccccc" VALIGN="top"><IMG SRC="/images/homepage/rev/logo_06.gif" WIDTH="196" HEIGHT="63"></TD> <TD BGCOLOR="cccccc" WIDTH="600" VALIGN="top"> <TABLE WIDTH="600" BORDER="0" CELLPADDING="0" CELLSPACING="0" VALIGN="top"> <TR> <TD VALIGN="top" HEIGHT="17" BGCOLOR="#CCCCCC"><IMG SRC="/images/homepage/rev/comp8_07.gif" WIDTH="600" HEIGHT="17"></TD> </TR>After:
<BODY ALINK="#000000" BGCOLOR="#FFFFFF" BOTMARGIN="0" MARGINHEIGHT="0" MARGINWIDTH="0" LEFTMARGIN="0" LINK="#666666" TOPMARGIN="0" VLINK="#666666"> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="800"> <TR> <TD BGCOLOR="cccccc" COLSPAN="2" VALIGN="top" WIDTH="196"><IMG HEIGHT="63" SRC="/images/homepage/rev/logo_06.gif" WIDTH="196"></TD> <TD BGCOLOR="cccccc" VALIGN="top" WIDTH="600"> <TABLE BORDER="0" VALIGN="top" CELLPADDING="0" CELLSPACING="0" WIDTH="600"> <TR> <TD BGCOLOR="#CCCCCC" HEIGHT="17" VALIGN="top"><IMG HEIGHT="17" SRC="/images/homepage/rev/comp8_07.gif" WIDTH="600"></TD> </TR>Back to Table of Contents
The -c command-line option adds an extra carriage return character to
each output line of reformatted data. This allows Unix versions of HTB to
create a DOS/Windows compatible text files directly.
Back to Table of Contents
The -d command-line option inhibits extra carriage return character
output even if present in the source data. This allows the Windows version
of HTB to create a Unix compatible text file directly. This is the default
behavior and correctly creates a natively compatible format whether Unix
or Windows.
Back to Table of Contents
The -e command-line option replaces the special markup characters "<",
">", and "&" with escape strings "<", ">", and
"&" respectively. Also, the tag sequence "<HTML><BODY><PRE>"
is added to the beginning of the output data and the sequence
"</PRE></BODY></HTML>" is appended to the end of the data.
This creates an entirely new HTML document, which when viewed with a Web
Browser, will appear as source instead of normal rendering. This is useful
in creating markup tag documentation and is the mechanism used to create
the examples in this document. Use in combination with the
-k option to do the conversion without applying
other reformatting options.
Back to Table of Contents
The -f command-line option will cause HTB to read from the "standard
input" stream and write to "standard output". This makes HTB a filter
program and allows embedding HTB within other stream manipulation processes
and programs like command-line "piping". Other options may be combined
with the filter option, but all file names specified with HTB are ignored.
Example Usage:
- Display only lines containing the text string, "hidden" in myfile.htm
(most likely <INPUT TYPE="hidden"...>
findstr "hidden" myfile.htm | htb -af |
|
grep -i hidden myfile.htm | htb -af |
See the Errors & Verification
section for another example of the HTB -f option when used in combination
with the Tidy HTML
cleanup program.
Back to Table of Contents
The -h command-line option (or incomplete/invalid command-line options) will display the following Help Screen:
htb - HTML/XML Beautifier 2.0, TG Schramer Consulting, lastimo@cox.net "htb" is a program to beautify HTML/XML files and has the following format: "htb -(options) <input filename> <new filename>" Options: a: Force break of all multi-attribute Tags with alignment on the 1st one (default for Tags going over 80 columns as whitespace permits). b: Force break of every Tag Attribute onto a new line with alignment on the last character in the Tag. c: Force extra Carriage Return character after each line (allows creation of DOS compatible file from Unix system). d: Never add extra Carriage Return character after each line (default). e: Escape Tag characters & create browser viewable source conversion (ie. "<" to "<", ">" to ">", etc.). f: Run as filter - read from standard input and write to standard output (any file names also specified are ignored). h: This screen. j: Join lines wherever possible and remove comments & extra whitespace. (overrides re-formatting options and compresses output). k: Keep current layout, just apply upper/lower case (overrides non-case related options). l: Make Tag names lower case. m: Make Tag Attribute case the opposite of the Tag name. n: Never break Tag Attributes onto separate lines. r: Remove Non-HTML tags. HTML 4.01 and common legacy Tags remain (overrides x option). s: Remove tabs from SCRIPTS and indent using blanks. Scripts could look worse, but the tabs are gone. By default scripts are not changed. t: Strip all but plain text content from input. No tags or comments remain. u: Make Tag names upper case (default). x: Treat unknown tags as "well-formed" XML. Case changes & attribute sorting only applied to known HTML tags (default if XML detected). y: Turn off XML detection (overrides x option, case changes go on all Tags). z: Remove stand-alone comments (not within SCRIPT, STYLE, etc). 0-9: Use (number) of spaces for indenting (default = 3). Options may be combined into one argument and any order (ie. -l -m -5 = -lm5). If output file is not defined, re-formatted data is sent to "standard out". Defaults: Tags/Attributes upper case, break Tags over 80 & indent by 3. Examples: - Make Tags and Attributes lower case and use 4 for indenting: "htb -l4 index.html newindx.html" - Defaults + no Tag breaking, remove comments & treat non-HTML tags as XML. "htb -nxz index.html newindx.html"Back to Table of Contents
The -j command-line option removes all unnecessary whitespace &
comments and joins the output lines together whenever possible. The result
is totally "unbeautified" output, but the size will be reduced from 10-40%
for quicker transfer over the network. Use this option whenever performance
is more important than readability.
Back to Table of Contents
When the current indenting and appearance of your tagged document is
acceptable, the -k command-line option may be used to change only the case
of the tag names and attributes with no other changes applied.
Example:
- Keep the current layout of an HTML document, but change the tag attribute
names to lower case (-m option, opposite of tag name case which by default is upper)...
htb -km myfile.html
<FORM ENCTYPE="multipart/form-data" NAME="coreform" METHOD="POST"> <INPUT TYPE="submit" VALUE="Submit Request"> <INPUT NAME="cgi" TYPE="button" VALUE="cgi2xml">cgi2xml <TABLE BORDER="5" CELLPADDING="5"> <TR> <TD> <FONT COLOR="purple"> <H4>Output formatting:</H4> </FONT>Debug: <INPUT NAME="debug"><BR> <BR> Filter: <INPUT NAME="filter"><BR> Output: <INPUT NAME="output"><BR> <BR> Pagestart: <INPUT SIZE="4" NAME="pagestart"><BR> Pagesize: <INPUT SIZE="4" NAME="pagesize"><BR> </TD> </TR> </TABLE> </FORM>After:
<FORM enctype="multipart/form-data" name="coreform" method="POST"> <INPUT type="submit" value="Submit Request"> <INPUT name="cgi" type="button" value="cgi2xml">cgi2xml <TABLE border="5" cellpadding="5"> <TR> <TD> <FONT color="purple"> <H4>Output formatting:</H4> </FONT>Debug: <INPUT name="debug"><BR> <BR> Filter: <INPUT name="filter"><BR> Output: <INPUT name="output"><BR> <BR> Pagestart: <INPUT size="4" name="pagestart"><BR> Pagesize: <INPUT size="4" name="pagesize"><BR> </TD> </TR> </TABLE> </FORM>Back to Table of Contents
The -l command-line option changes all HTML tag names and their
attributes to lower case. Combine with the -m (mixed case) option to keep the tag names lower case, but make the attribute names
upper case.
Before:
<FORM ENCTYPE="multipart/form-data" NAME="coreform" METHOD="POST"> <INPUT TYPE="submit" VALUE="Submit Request"> <INPUT NAME="cgi" TYPE="button" VALUE="cgi2xml">cgi2xml <TABLE BORDER="5" CELLPADDING="5"> <TR> <TD> <FONT COLOR="purple"> <H4>Output formatting:</H4> </FONT>Debug: <INPUT NAME="debug"><BR> <BR> Filter: <INPUT NAME="filter"><BR> Output: <INPUT NAME="output"><BR> <BR> Pagestart: <INPUT SIZE="4" NAME="pagestart"><BR> Pagesize: <INPUT SIZE="4" NAME="pagesize"><BR> </TD> </TR> </TABLE> </FORM>After:
<form enctype="multipart/form-data" method="post" name="coreform"> <input type="submit" value="Submit Request"> <input name="cgi" type="button" value="cgi2xml">cgi2xml <table border="5" cellpadding="5"> <tr> <td> <font color="purple"> <h4>Output formatting:</h4> </font>Debug: <input name="debug"><br> <br> Filter: <input name="filter"><br> Output: <input name="output"><br> <br> Pagestart: <input name="pagestart" size="4"><br> Pagesize: <input name="pagesize size="4"><br> </td> </tr> </table> </form>Back to Table of Contents
The -m command-line option makes the tag attribute case the opposite
of the tag name. Since the HTB default is to make tag names upper case,
the addition of this option will make the tag attributes lower case. If
combined with the -l option (lower case) the tag
names will be lower case, and the tag attributes will be upper case. See
the -k option for an example.
Back to Table of Contents
The -n command-line option cancels the default behavior of breaking tags
which exceed the 80 column limit and keeps tags intact within a single
line of output regardless of their length. This is often desirable,
especially on XSL files.
Back to Table of Contents
The -r command-line option strips any tag which is not part of the
HTML 4.01 specification (and a group of widely recognized, commonly used
legacy tags) from the output. Its a convenient way to separate HTML from
hybrid files like ASP, JSP, XSL or files containing custom tags. The
stripped tags are reported along with any errors to "standard error".
Example:
- Remove all non-HTML tags from an XSL/XHTML file...
htb -r myfile.xsl
<xsl:for-each select="ELEMENT/NODE1"> <xsl:variable select="position()-1" name="vpos" /> <TR VALIGN="top"> <TD ALIGN="center"><FONT SIZE="1" FACE="Helvetica"><xsl:value-of select="$vpos" /></FONT> </TD> <TD ALIGN="center"><FONT FACE="Helvetica"> <INPUT NAME="ELEM{$vpos}" TYPE="text" VALUE="Element {$vpos}" /></FONT> </TD> <TD ALIGN="center"><FONT FACE="Helvetica"> <INPUT NAME="NUMB{$vpos}" TYPE="text" VALUE="2" /></FONT> </TD> <TD ALIGN="center"><FONT FACE="Helvetica"> <xsl:variable select="count(//NODE1[@id > -1)" name="pcnt" /> <xsl:variable name="selsize"> <xsl:choose><xsl:when test="$pcnt < 5"> <xsl:value-of select="$pcnt" /> </xsl:when><xsl:otherwise> <xsl:value-of select="'5'" /> </xsl:otherwise></xsl:choose> </xsl:variable> <SELECT SIZE="{$selsize}" NAME="VALU{$vpos}"> <xsl:for-each select="//VALUE[@id > -1]"> <OPTION VALUE="{@id}"> <xsl:value-of select="NAME" /></OPTION> </xsl:for-each> </SELECT></FONT> </TD> </TR> </xsl:for-each>After:
<TR VALIGN="top"> <TD ALIGN="center"><FONT FACE="Helvetica" SIZE="1"></FONT> </TD> <TD ALIGN="center"><FONT FACE="Helvetica"> <INPUT NAME="ELEM{$vpos}" TYPE="text" VALUE="Element {$vpos}" /></FONT> </TD> <TD ALIGN="center"><FONT FACE="Helvetica"> <INPUT NAME="NUMB{$vpos}" TYPE="text" VALUE="2" /></FONT> </TD> <TD ALIGN="center"><FONT FACE="Helvetica"> <SELECT NAME="VALU{$vpos}" SIZE="{$selsize}"> <OPTION VALUE="{@id}"></OPTION> </SELECT></FONT> </TD> </TR>Back to Table of Contents
HTB automatically removes any tab characters found in the source
document during the indenting process, but by default SCRIPTs are kept
intact. To completely remove all tabs, specify the -s option and tab
characters found within SCRIPT elements will be replaced with sets if
of indented spaces. This could make the indented script statements look
slightly worse and may require minor editing, but the beautified output
is clear of any tab characters.
Back to Table of Contents
The -t command-line option strips all markup tags, comments and converts
the input to plain text. All ASCII and ISO8859-1 HTML escape strings are
converted back to the characters they represent. An attempt is made to
compress extra whitespace, but in general the text will require additional
re-formatting to be made presentable. Use this option to isolate the textual
content within tagged documents (not necessarily HTML) for use in other
documentation.
Back to Table of Contents
The -u command-line option changes all HTML tag names and their
attributes to upper case. Since this is the default behavior of HTB, it
is not required. Use the -m (mixed case) option to
keep the tag names upper case, but make the attribute names lower case.
Before:
<form enctype="multipart/form-data" name="coreform" method="POST"> <input type="submit" value="Submit Request"> <input name="cgi" type="button" value="cgi2xml">cgi2xml <table border="5" cellpadding="5"> <tr> <td> <font color="purple"> <h4>Output formatting:</h4> </font>Debug: <input name="debug"><br> <br> Filter: <input name="filter"><br> Output: <input name="output"><br> <br> Pagestart: <input size="4" name="pagestart"><br> Pagesize: <input size="4" name="pagesize"><br> </td> </tr> </table> </form>After:
<FORM ENCTYPE="multipart/form-data" METHOD="POST" NAME="coreform"> <INPUT TYPE="submit" VALUE="Submit Request"> <INPUT NAME="cgi" TYPE="button" VALUE="cgi2xml">cgi2xml <TABLE BORDER="5" CELLPADDING="5"> <TR> <TD> <FONT COLOR="purple"> <H4>Output formatting:</H4> </FONT>Debug: <INPUT NAME="debug"><BR> <BR> Filter: <INPUT NAME="filter"><BR> Output: <INPUT NAME="output"><BR> <BR> Pagestart: <INPUT NAME="pagestart" SIZE="4"><BR> Pagesize: <INPUT NAME="pagesize" SIZE="4"><BR> </TD> </TR> </TABLE> </FORM>Back to Table of Contents
HTB automatically detects XML compliant files and is able to apply
reformatting to unknown tags since they meet the predictable behavior
of the XML specification. If the input document is not strictly XML
compliant, but does contain custom tagging which may be considered
"well-formed" XML, the -x option may be used to apply XML handling on
these otherwise ignored tags. If XML is detected, either automatically,
or with the -x option, the tag case is NOT changed for these non-HTML tags,
since they are often case-sensitive. Also, the attributes of unknown tags
will remain in original order instead of being sorted as with HTML
attributes. To turn off XML auto-detection and apply case changes and
attribute sorting to all tags known and unknown, use the
-y option.
Example:
- Make tag names and attributes lower case,
never break tags, and treat unknown tags in an
HTML file as well formed XML...
htb -lnx myfile.html
<TR><TD WIDTH=182 ALIGN=left BGCOLOR="#ffffff"> <NYT_HEADLINE> <A HREF="/onthisday/20020619.html"><FONT SIZE="3" FACE="times"><B>On June 19 ...<BR></B></FONT></A> </NYT_HEADLINE> <NYT_BYLINE> <FONT SIZE="-1"></FONT> </NYT_BYLINE> <NYT_SUMMARY> <FONT SIZE="-1"> <B>1964:</B> The Civil Rights Act of 1964 was approved. (<A HREF="/onthisday/big/0619.html">See this front page.</A>) <BR> <B>1903:</B> Lou Gehrig was born. <A HREF="/onthisday/bday/0619.html">(Read about his life.)</A> <BR> <B>1886:</B> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. <A HREF="/onthisday/harp/0619.html">(See the cartoon.)</A></FONT> </TD></TR>After:
<tr> <td align="left" bgcolor="#ffffff" width="182"> <NYT_HEADLINE> <a href="/onthisday/20020619.html"><font face="times" size="3"><b>On June 19 ...<br></b></font></a> </NYT_HEADLINE> <NYT_BYLINE> <font size="-1"></font> </NYT_BYLINE> <NYT_SUMMARY> <font size="-1"> <b>1964:</b> The Civil Rights Act of 1964 was approved. (<a href="/onthisday/big/0619.html">See this front page.</a>) <br> <b>1903:</b> Lou Gehrig was born. <a href="/onthisday/bday/0619.html">(Read about his life.)</a> <br> <b>1886:</b> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. <a href="/onthisday/harp/0619.html">(See the cartoon.)</a></font> </td> </tr>Back to Table of Contents
HTB automatically detects XML compliant files and treats the unknown
tags differently than HTML tags. XML tags are indented as whitespace
permits and case changes & attribute sorting are not applied. To turn
off this default behavior and apply case changes & sorting to all
tags known and unknown, specify the -y option.
Example:
- Never break tags, make all
tags lower case whether HTML or not, and do not change indenting for
unknown tags...
htb -lny myfile.html
<TR><TD WIDTH=182 ALIGN=left BGCOLOR="#ffffff"> <NYT_HEADLINE> <A HREF="/onthisday/20020619.html"><FONT SIZE="3" FACE="times"><B>On June 19 ...<BR></B></FONT></A> </NYT_HEADLINE> <NYT_BYLINE> <FONT SIZE="-1"></FONT> </NYT_BYLINE> <NYT_SUMMARY> <FONT SIZE="-1"> <B>1964:</B> The Civil Rights Act of 1964 was approved. (<A HREF="/onthisday/big/0619.html">See this front page.</A>) <BR> <B>1903:</B> Lou Gehrig was born. <A HREF="/onthisday/bday/0619.html">(Read about his life.)</A> <BR> <B>1886:</B> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. <A HREF="/onthisday/harp/0619.html">(See the cartoon.)</A></FONT> </TD></TR>After:
<tr> <td align="left" bgcolor="#ffffff" width="182"> <nyt_headline> <a href="/onthisday/20020619.html"><font face="times" size="3"><b>On June 19 ...<br></b></font></a> </nyt_headline> <nyt_byline> <font size="-1"></font> </nyt_byline> <nyt_summary> <font size="-1"> <b>1964:</b> The Civil Rights Act of 1964 was approved. (<a href="/onthisday/big/0619.html">See this front page.</a>) <br> <b>1903:</b> Lou Gehrig was born. <a href="/onthisday/bday/0619.html">(Read about his life.)</a> <br> <b>1886:</b> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. <a href="/onthisday/harp/0619.html">(See the cartoon.)</a></font> </td> </tr>Back to Table of Contents
The -z command-line option removes all stand-alone comments from the
input data. This does not include JavaScript comments or comment blocks
within APPLET, OBJECT, SCRIPT, and STYLE tags used to hide text from
browsers. The revised output should render and function as the original.
The -z option is useful in reducing tagged file sizes when the comment
blocks are no longer needed, or in removing dead, commented-out sections
within documents which tend to collect over time. The stripped comments are
not lost, however. These are sent to the "standard error" stream and may
be collected in another file for reference or for use in documentation by
"standard error" redirection ("2>" or "2>>"). If "standard error"
is not redirected, the stripped comments will be seen scrolling by on the
screen. Use in combination with the -k option to
strip comments without otherwise changing the document layout.
Example Usage:
- Beautify myfile.htm and save the output to newfile.htm
but save the stripped comments in another file called comments.txt
by redirecting the "standard error" stream to a file.
htb -z myfile.htm newfile.htm 2> comments.txt |
|
htb -z myfile.htm newfile.htm 2>> comments.txt |
A command-line option from 0 to 9 represents the number of spaces
used for increments of indenting. Specifying 0 will cause all indenting
to be removed and the tags will shifted to the left. If not specified, the
default is to indent by 3.