Uno script REXX che estrae dai file HTML i tag HTML e salva il testo restante come file ASCII.
Il programma è distribuito come pacchetto ZIP: scompattare il file HTM2TXT.CMD
in una cartella compresa nel percorso PATH. MAKEOBJ.CMD
permette di creare l'oggetto relativo sulla Scrivania. Vedi sotto per il(i) link di download.
Qui di seguito trovi i link di download per l'installazione manuale del software:
HTML to Text v. 1.0 (9/9/1997, Otto Räder) | Readme/What's new |
HTM2TXT v 1.0, Sep.09,1997 by Otto Räder
Description:
HTM2TXT.CMD is a REXX script which extracts HTML-tags from
.HTML-files used in Internet www-communication and stores
the remaining text into an ASCII-file.
Group:
HTM2TXT belongs to group: ..pub/os2/apps/internet/www/util/
Freeware:
HTM2TXT may be distributed freely under the following conditions.
Copyright notices must NOT be removed, all files contained in the file
inventory below must be distributed together (you may not remove any
files), and you may not charge for the program.
If you find the program useful then send a post-card (picture of
the location where you live) to:
Otto Räder
Hauptstrasse 61B/13
A3001 Mauerbach
---------------
Austria
Prerequisites:
HTM2TXT requires OS/2 and REXX.
It has been developed and tested under OS/2 Warp,
there is no intention to move it to other platforms.
Distribution:
The following files are contained in HTM2TXT1.ZIP:
HTM2TXT.CMD the REXX command-file 1997-09-09
HTM2TXT.ICO an icon file contributed by Gerard Pinkas, pinkas@en.com
MAKEOBJ.CMD a command to create a desktop program object
README.TXT documentation, this file
FILE_ID.DIZ Id-file
Installation:
To install HTM2TXT just UNZIP the HTM2TXT1.ZIP file and place the
command into a directory contained in your CONFIG.SYS PATH= statement.
You may use the MAKEOBJ.CMD to create a desktop oject for HTM2TXT.CMD.
You should run MAKEOBJ.CMD from that directory where HTM2TXT.CMD and
HTM2TXT.ICO are installed.
Usage:
From an OS/2 command line start HTM2TXT:
htm2txt filename.htm
Make sure filename.htm is in the current directory.
filename may contain wildcard character '*'.
or
Drag and drop a .HTML object to the HTM2TXT object if you
have created one using makeobj.cmd.
HTM2TXT will create an output file 'filename.txt' and it will
start an editor to view this file.
Note: HTM2TXT will follow <a href="...> tags and tries
to resolve the given link-address. If it can be
accessed it will be included in the .txt file.
Note: Following options have been added on users requests.
They have been made 'options' in order to guarantee
consistency to current users.
Following statements may be changed to customize operation:
line 11: linemax=72 maximum line length in output file.
Any text longer will be split to
output lines not longer than 'linemax'
line 12: pixlbyt=6 when <td width="nnnPIX">
then the column width
in tables is determined
by: chars = nnn/pixlbyt.
line 13: editor='E' the name of an ASCII editor to display
result file. It may be changed to the
installations favoured editor.
editor='' causes no editor to be called
line 14: chain='Y' tells HTM2TXT to follow href-chains.
any other setting inhibits chaining.
line 15: showu='N' tells HTM2TXT not to show href-chain-addresses
in output-text. if set to 'Y' chain-addresses
are shown in output-text.
line 16: nocmt='N' tells HTM2TXT not to suppress html-comments.
any other value suppresses html-comments
in output file.
line 17: ofile='.TXT' tells HTM2TXT the outputfile-name should be
derived from the inputfile-name: it should
be ifiname.TXT.
any other value may specify a valid
path\filename or a symbolic device
like STDOUT.
line 18: bold='N' <bold>-tags are ignored. if set to 'Y'
text within <bold>-tags is shown in
uppercase framed by the characters defined
in "boldon='<'" and "boldoff='>'" lines 19 and 20.
line 21: retab='N' If bold='Y' then if retab='Y' a new line
is created aftere </b>-tag.
line 22: retap='N' If set to: retap='Y' then a new line is
created after each <p>-tag.
line 77: consts= this is a table of variables to substitute
special characters. This table has been
contributed by tremro@digicom.qc.ca
You may temporarily overwrite these parameters by adding options
when starting htm2txt from an os/2 command line:
htm2txt filename.html l 80 to set linemax to 80 characters
htm2txt filename.html p 8 to set pixlbyt to 8 pixels/char
htm2txt filename.html e tedit to set editor to tinyedit
htm2txt filename.html o finame to define an output file name
htm2txt filename.html f n to suppress chaining
htm2txt filename.html u to include url-references in .txt
htm2txt filename.html c to suppress comments
htm2txt filename.html b to show '*'BOLDED TEXT'*' in uppercase
htm2txt filename.html b r to create NewLine after </b>-tag
htm2txt filename.html n to create NewLine after <p>-tag
These options may appear in any order after the filename:
htm2txt filename.html e tedit p 8 l 80 u f n o stdout c b r n
Warranty:
The program is distributed on an as-is basis.
It tries to extract as much text as possible,
however, i am sure, there are some special forms
of tags which i missed.
Normally such tags are simply ignored.
There is no guarantee to get certain results
nor is any guarantee to avoid damages of existing files.
Note: In the current directory the program will
overwrite any file with filename of
input-file and a file extension of .TXT, eg.: filename.TXT !
Comments:
Comments and recommendations pls to:
oraeder@ibm.net |
hobbes.nmsu.edu/download/pub/os2/util/convert/RexxHTML2TEXT_1-0.zip |
This work is licensed under a Creative Commons Attribution 4.0 International License.
Commenti
Martin Iturbide
Ven, 04/08/2023 - 04:22
Collegamento permanente
New Link: https://hobbes.nmsu
Aggiungi un commento