3. Input and Output Methods
Managing input for EDM
systems
Input is likely to come from many sources and in many
formats. Different methods may be required for each source, but in most
systems the bulk of the input will arrive on paper, film or in digital
format. Documents on paper or film are said to be in analogue (human
readable) format and must be converted to digital (computer readable) format
before they can be entered into an Electronic Document Management System.
Paper input
Paper often requires preparation prior to input which may
involve the removal of staples, unbinding and sorting into document types or
sizes. Some paper scanners with automatic feeders will accept mixed
input, but most will be unable to detect the start and end of a batch of
documents without some form of mark or header sheet. Occasional double sided
documents can present problems but, if the majority of input is double
sided, duplex scanners are available which automatically scan both sides in
one pass. The majority of scanners are limited to A4 or A3 input but models
are available for larger input and some can handle documents of A0 size or
above. All scanners can handle black and white input, many can also accept
greyscale and an increasing number can handle colour. Special scanners have
also been developed for vouchers, cheques and other small format documents.
High volumes of individual paper pages can be input via
fast rotary scanners, but delicate or bound material must be scanned on
flat-bed units which may incorporate a book cradle to keep the adjoining
pages of bound books in the same plane. This is labour intensive and can be
a slow process compared with rotary scanning.
Document scanning generates a raster bit-map of the
document. The process is similar to fax in that a sensor passes over a small
section of the document and detects the presence or absence of a mark at
closely spaced points, the document or the scanning head is advanced and
another line is scanned until the task is completed. For most applications a
raster of 200 points per inch is sufficient. Compression techniques reduce
the amount of data to be stored but an A4 size 200 dpi greyscale image can
require 1.2 Mb. Higher resolution scans and colour scans generate much
higher volumes. An exact facsimile of any scanned and stored page can be
re-created for display, transmission or printout and it can usually be
re-sized or annotated but not edited or otherwise altered.
Recognition Software can interpret scanned text as
alphanumeric characters and output it as coded data. This is ideal for input
to a search engine as it permits searches by content, but it does not permit
a faithful reconstruction of the original. Many document management systems
therefore hold images in both formats, allowing users to conduct full
text searches to locate a document and then view a true facsimile, generated
from the raster data.
When extensive search facilities are not required and the
document can be retrieved via a few fixed fields only the raster image need
be stored but relevant retrieval data will still have to be added to an
index. This can sometimes be done during the quality control process
subsequent to scanning. The scanned image is presented to an operator on a
screen which also contains a form on which blank fields must be completed.
This operation can be slow but some fields, such as the date, can be
completed automatically while others can be filled by scanning bar-code or
by using character recognition on a specific portion of each page to greatly
speed the indexing process.
These methods allow images to be stored and retrieved and
there is usually provision for annotation but it is not normally possible to
edit or revise any image. This can be a significant advantage if it might be
required as evidence but some applications, such as catalogue production
where illustrations may be repeated in several places, require more
advanced methods involving the separate treatment of text, graphics and
images to permit re-use of document content. This can be achieved by using
advanced Content Management Systems.
Document scanning services
Facilities typically offered by
document scanning
bureaux include scanning and digitising of paper and film, COLD
(Computer Output on Laser Disc), data capture via OCR/ICR (Optical Character
Recognition/Intelligent Character Recognition), bar code reading with
output to a wide variety of magnetic and optical media or on-line. Some
bureaux also offer file format conversion, keyboarding and image warehousing
whereby they hold very large volumes of digital records in a secure archive
to which clients have on-line or web access. Most bureaux offer both
electronic and micrographic services but some specialise in a narrow field,
such as data capture from forms,
large format scanning or
scanning of books, manuscripts and
fragile originals. Many bureaux stock fast-moving consumables such as
magnetic and optical media, toner and print paper and they may also have
local sales agencies for hardware such as PCs, scanners, printers, and
monitors.
There is a sound case for film being employed as a back-up
to any electronic archive. This reflects growing concern that rapid advances
in computing technology may necessitate the frequent conversion of archives
held in outdated software formats or on magnetic media for which reading
equipment is no longer supported.
It is normal to sort input into similar
document types prior to scanning but some scanners with automatic feeders
will accept mixed input sizes and thicknesses, most will be unable to detect
the start and end of a batch of documents without some form of mark or
header sheet. Occasional double sided documents can present problems but, if
the majority of input is double sided, duplex scanners are available which
automatically scan both sides in one pass.
Scanner/cameras create a
record on roll film in addition to digitised output. This provides a
security backup and permits destruction of the original paper as the film is
ideal for archival storage and fully acceptable as evidence if correctly
produced. These units are essentially a combination of a document scanner
and a microfilm camera. Models are available for typical office documents
and large formats such as maps. plans and drawings and cameras are offered
for roll and fiche. The process produces a film image for
archival storage as a true record of each document and a digital file for
input to an electronic document management system.
Units can often be set for filming only, scanning only or
both simultaneously. Units for large formats are usually fed manually so
they are relatively slow, but scanner/cameras designed for office documents
may operate at 400 or more images per minute.
Scanners for small originals
Scanners for small documents,
accept cheques and vouchers which are usually all of the same size and
unstapled. Special scanners - often operating at high speeds,
have been developed especially for small input. All scanners can handle
black and white input, many can also accept greyscale and an increasing
number can handle colour. Options may include the ability to capture MICR or
OCR data during the scanning process.
Scanners for typical office documents
Most office documents are either A4 or A3 size but there
are, of course, many exceptions. This type of input often requires
preparation prior to scanning which may involve the removal of staples,
unbinding and sorting into document types or sizes.
High volumes of individual paper pages can be input via
fast rotary scanners, but delicate or bound material must be scanned on
flat-bed units which may incorporate a book cradle for handling bound
originals. Some rotary scanners are offered with optional flatbed
accessories which enable small volumes of delicate or oversize originals to
be scanned manually and fed into the digital file.
There are so many A4 scanners available in the UK that we
have arbitrarily split them into two groups - those capable of scanning up
to A4 at under 30
images/minute and those capable of
30 images/minute or more. This
will assist most searches, but those seeking a mid range scanner are advised
to look in both groups. There are fewer
A3 scanners on the market so they
are all listed in one grouping.
Scanners for large formats, books and delicate
originals.
Large format document scanning
is a slower process than the techniques used for office documents and
usually involves more manual effort to feed and remove documents from the
scanner. The majority of large format
scanners listed in this section are limited to A1 or A2 input but models
are available for larger input and some can handle documents of A0 size or
above. All scanners can handle black and white input, many can also accept
greyscale and an increasing number can handle colour.
Delicate
or bound material must be scanned on flat-bed units which may incorporate a
book cradle to keep the adjoining pages of bound books flat and in the same
plane. This is labour intensive and can be a slow process compared with
rotary scanning but special book scanners are available which automatically
turn pages and hold them flat during scanning. Some machines designed
for public use in locations such as libraries incorporate methods of
accepting payment.
Input from microfilm
Some companies already have large collections of
information on microforms which they would like to access via an electronic
document management system. Careful consideration is essential before
deciding to scan the entire content because, if references are likely to be
relatively infrequent, it may be preferable to buy a relatively inexpensive
reader-scanner and scan specific images when they are needed rather than pay
for the conversion and indexing of the entire film archive.
Images on microforms are input on special scanners which
are often dedicated to one film format. The data sheets for film scanners on
this site are split into four groups, those for aperture cards,
models designed for microfiche and
jackets, models for roll formats and those for multiple formats.
Aperture Card Scanners
Digital scanners for aperture cards
can usually optically enlarge film images on to an integral screen for
reference or verification but some are specifically designed to scan the
film images for input to a CAD system, printout via a laser-printer or
plotter or store them to disk. Some models can accept more than one film
format and they will be found under
Film scanners - universal on this site. Factors such as the choice of
lenses, film carrier options and the image retrieval methods available are
similar to optical readers and reader-printers, but a much wider range of
printout is available via a laser-printer. Options typically include the
ability to mask any part of the image, enhance the quality of images,
correct skewed images, and electronically vary enlargement or printout.
Production scanners for aperture cards are intended for
the rapid digitising of high volumes of data from film rather than reference
to or enlargement from selected images. A screen may be provided for
verification purposes, but if a facility for full size display is required
it is usually provided via a CRT monitor in order to show the quality of the
digitised data rather than an optical enlargement of the microfilm image.
Aperture card scanners can be hopper-fed to operate automatically and most
aperture card scanners can interpret Hollerith code, text, or marks on the
card to compile indexing data during the scanning operation.
Scanners for fiche and jackets
Inexpensive fiche scanners are
available to digitise selected frames from fiche and jackets but this is a
relatively slow process. Production
scanners for fiche are intended for the rapid digitising of high
volumes of data from fiche. If a facility for full size display is required
it is usually provided via a CRT monitor rather than an optical enlargement
of the microfilm image. It is important to ensure that a high speed fiche
scanner can cater for the particular grid layout of the input. Jackets are
less suitable for automated scanning because the image layout is not
precise, but some film scanners can compensate automatically for
irregularities in format and may even accept mixed image sizes such as 35mm
and 16mm.
Applications include file conversion of microfiche and jacket
archives to electronic document management systems. Production
scanners for high volumes are fast and relatively expensive. It is necessary
to have a long-term requirement for the scanning of high volumes of input to
justify an in-house unit, but many microfilm bureaux now offer fiche
scanning services at competitive prices.
Scanners for roll film formats
For occasional use and low volume applications digital
reader-scanners are available which optically enlarge film images on to an
integral screen for reference or verification but also incorporate a
facility to scan the images for input to an electronic document management
system, printout via a laser-printer or store them to disk.
Digital reader-printers have their
own section under micrographic equipment. Most
scanners for roll film are intended for
the rapid digitising of high volumes of data from film rather than reference
to or enlargement from specific images. Roll formats can be scanned
automatically; the scanner advances the film from frame to frame by
detecting image blips or inter-frame gaps. A screen may be provided for
verification purposes, but if a facility for full size display is required
it is usually provided via a CRT monitor.
Applications for Production roll film
scanners
Uses include file conversion
of microfilm archives to electronic document management systems and the
automation of document scanning when assorted input is involved. If
documents of different sizes and quality are to be input to an electronic
document management system it may not be possible to batch scan and input
may be very slow.
An alternative input method,
often advocated for maps, drawings and other large originals, is to
microfilm the paper documents to create an archival film record, then scan
the film rather than the original paper. If roll film is employed the
scanning operation can then be performed automatically at high speed. This
avoids the need for large format document scanners which tend to be slow and
expensive and also overcomes any problems with bound or delicate input. It
also results in a permanent film record of the content at the time of
filming. Most UK bureaux offering
large format microfilming
services are included in the commercial section of this site.
Production film scanners are fast and relatively expensive. It is necessary
to have a long-term requirement for the scanning of fairly high volumes of
input to justify an in-house unit, but many microfilm bureaux now offer
film scanning services at
prices which make even low-volume input economic.
Universal film scanners
Low volume digital reader-scanners can usually optically
enlarge film images on to an integral screen for reference or verification
but they may also incorporate a facility to scan the images for input to an
electronic document management system, printout via a laser-printer or store
them to disk. Versions are available to accept all microforms but this may
involve changing film carriers and motorised roll film carriers can be
heavy. Factors such as the choice of lenses, film carrier options and
the image retrieval methods available are similar to optical readers and
reader-printers, but a much wider range of printout is available via a
laser-printer. Options typically include the ability to mask any part of the
image, correction of skewed images, superimposition of repetitive data on
prints and electronically variable enlargement.
High volume
universal scanners for roll film, fiche and aperture cards are intended
for the rapid digitising of high volumes of data from film rather than
reference to or enlargement from specific images. As with slower units, film
carriers usually have to be interchanged between roll and flat formats.
Digital input
This can come from internal sources, such as data
processing, word processing, drawing office systems, etc. If the software on
which it is generated is interfaced with the document management system it
can be stored as received. Digital input from external sources may come in a
wide variety of formats, some of which may have to be converted. Input from
the web can also be accepted by most systems and some use the web for
storage. In most cases this data is already indexed and the index can be
incorporated into the EDM System.
Input via OCR and forms
processing
High volumes of input may arrive on paper forms as
enquiries, orders, or the result of a survey. In most cases the forms will
be originated by the organisation that subsequently processes the returns.
Forms Processing software can be employed to scan each form, using text or
mark recognition to examine the content of boxes or specific locations and
convert it to input data for processing. Provision is always incorporated
for the display of doubtful items which an operator can examine and correct
but, if the form is carefully prepared, accuracy can be very high especially
when mark recognition is sufficient to interpret the response. Data
extracted from the forms is passed on for processing while the scanned forms
are held in the document management system.
Optical Character Recognition (OCR) is often employed to
create a machine readable version of the content of documents containing
text. This results in searchable files that can be used to create indexes or
made available to researchers. UK bureaux offering
forms processing,
OCR and other recognition
facilities are included in the Services section of this site. It is also
possible to conduct OCR simultaneously with document scanning on suitably
equipped scanners.
Input from E-mail activity
The volume of incoming and outgoing e-mail is increasing
rapidly within most organisations and presents special problems. Some input
will be unsolicited junk which can safely be destroyed but most of the
remainder should be retained. Any item which confirms or alters an order or
forms the basis of a contract must be carefully preserved. Copies of all
estimates, offers or quotations generated internally and sent via e-mail
must also be retained. Some PCs automatically destroy the content of an
in-box after a set period; this is a common cause of data loss which can be
avoided if e-mails are passed to a document management system for indexing
and retention.
E-mail input must be examined for possible virus infection
and attachments must be treated with caution unless they come from a known
and trusted source - junk mail can be discarded at this stage. Many document
management systems offer a method of e-mail integration, sometimes as an
optional extra, and this can be used to record, index and file e-mails for
long-term retention. Guidelines, agreed at senior management level, are
needed to assist staff on what should be archived and what can be destroyed.
Staff should also be aware that their e-mails are not confidential and, like
the telephone, the system is only for limited personal use.
Input generated by customers completing an e-form on the
company website needs to be interpreted, processed and stored. Special
software has been developed to automatically acknowledge these e-mails,
enter their content into a workflow system for action and pass the original
on to a document management system for indexing and storage.
EDM system output
Output from an Electronic Document Management System will
either be on request in response to user searches, or issued automatically
as part of a pre-determined programme. To handle user enquiries the content
must be indexed to match user needs and the available methods are described
under Indexing and retrieval.
The result of any search will usually be output as a screen image but can be
printed if required.
The most obvious examples of automatic distribution are
Workflow and Process Management applications. After receipt, new documents
will be routed to those appointed to handle them. Workflow users will be
alerted to new input, view it on screen and pass it to the next user in the
chain as soon as they have completed their allotted tasks. Members of
workgroups will also automatically receive appropriate input and may elect
to circulate selected items to others for comment, this can be at least
partly automated and handled electronically without printout. An increasing
amount of distribution to external offices, agents and customers can be
handled via intranet, extranet or e-mail and this too eliminates printout.
One of the advantages and major cost savings of EDM can be a reduction in
printout, postage and document copying expenditure.
Nevertheless, printout facilities will still be required
for most applications. Prints may be produced locally on desktop printers or
centralised using high volume versatile printers able to produce a wide
range of output sizes in black and white or colour.
Maintenance manuals, catalogues, software and similar
publications can be distributed to agents and customers who are not
connected to the network by printing out to CD (Compact Disk). These disks
are usually self-loading and incorporate indexes, so they are completely
self-contained. Alternatively, the information can be published on an
intranet or the web and users then only require a browser for access.
COM recording and plotting to film
When documents need to be
available for instant access it is logical to hold them in digital format
within electronic document management systems, but the period of peak
activity is usually of relatively short duration. Thereafter the need for
retrieval remains a possibility, but in many applications the majority of
the documents will never be referenced again.
Despite advances in the capacity of digital storage, it is often impractical
to hold archival information indefinitely on instant access devices. Another
problem is that as systems evolve it is time-consuming and expensive to
ensure that all digital archives held on outdated media or in abandoned
formats are converted to the new format or operating system. Repetitive
conversion also carries an attendant risk of data loss. The need to cater
for large quantities of archival material can act as a brake on system
development.
Converting archival digital data to microfilm creates a secure storage
solution which is fully standardised and which will permit access in 100 or
more years time, no matter how much the digital system may have changed in
that period.
COM (Computer Output Microfilm, also known as IOM - Image
Output Microfilm) offers a fast and economical method of converting digital
files to proven and archival microformats. The master films should be
preserved in a remote archive, but exact copies can be produced
inexpensively for distribution or use when access from digital storage is
terminated.
If a microfilmed document, thought to be archival, needs to be re-input to
an electronic document management system, it can easily be scanned and
digitised. Reader-scanners suitable for such applications can also produce
prints via a laser-printer if hard copy is preferable. They are listed on
this site under Film Scanners.
COM recorders can now emulate
sophisticated laser-printers to allow information on film to be recorded
exactly as if the information had been printed to paper and then
microfilmed. Raster data derived from document scanning can also be directly
input to several models. Speeds are impressive and some machines can convert
a print stream to images on roll film or microfiche at up to 400 pages per
minute. Common reduction ratios are 24X, 42X, 48X and 72X.
Systems involving the distribution of copy films to outstations, and many
internal office applications, tend to employ microfiche but 16mm roll is
widely used, especially for archiving applications. COM output on 16mm roll
is usually in the form of blipped images so that the film can be used in
conjunction with automated image retrieval systems. When microfiche are
employed, the entire process of recording, film development, duplication and
collation of variable numbers of copies from each master fiche can be
conducted as an automated in-line operation.
COM recorders can operate as on-line or off-line devices. This permits data
held on a wide variety of storage media to be converted by a bureau facility
and returned as fully indexed microfilm. One relatively low-priced COM
recorder has been designed for in-house applications requiring substantial
volumes of digital data to be archived on 16mm roll film. Images are
accepted in TIFF, JPEG and other raster formats and written to film at up to
240 letter-sized images per minute. The unit appears as a drive to the
system. It handles image resolutions up to 600 dpi and scales images
automatically from 20:1 to 60:1 reduction to match the application
specification.
COM/CAD plotting
35mm roll and aperture card COM output
options are also available and these typically relate to the plotting of CAD
output direct to microfilm. The 35mm frames may be set sequentially along a
roll or recorded directly on to single 35mm frames mounted in aperture
cards. When aperture cards are employed, the unit normally contains its own
film processor in order to deliver fully processed and titled aperture cards
at speeds of up to 35 cards per hour. Hollerith punching of the cards is
also possible during the production process. Resolution can be equal to 400
dpi on the original size drawing.
(next
chapter)
(back to top)