DATA INPUT TECHNIQUES

Since the input of attribute data is usually quite simple, the discussion of data input techniques will be limited to spatial data only. There is no single method of entering the spatial data into a GIS. Rather, there are several, mutually compatible methods that can be used singly or in combination.

The choice of data input method is governed largely by the application, the available budget, and the type and the complexity of data being input.

There are at least four basic procedures for inputting spatial data into a GIS. These are:

Manual digitizing;
Automatic scanning;
Entry of coordinates using coordinate geometry; and the
Conversion of existing digital data.


Digitizing

While considerable work has been done with newer technologies, the overwhelming majority of GIS spatial data entry is done by manual digitizing. A digitizer is an electronic device consisting of a table upon which the map or drawing is placed. The user traces the spatial features with a hand-held magnetic pen, often called a mouse or cursor. While tracing the features the coordinates of selected points, e.g. vertices, are sent to the computer and stored. All points that are recorded are registered against positional control points, usually the map corners, that are keyed in by the user at the beginning of the digitizing session. The coordinates are recorded in a user defined coordinate system or map projection. Latitude and longitude and UTM is most often used. The ability to adjust or transform data during digitizing from one projection to another is a desirable function of the GIS software. Numerous functional techniques exist to aid the operator in the digitizing process.

Digitizing can be done in a point mode, where single points are recorded one at a time, or in a stream mode, where a point is collected on regular intervals of time or distance, measured by an X and Y movement, e.g. every 3 metres. Digitizing can also be done blindly or with a graphics terminal. Blind digitizing infers that the graphic result is not immediately viewable to the person digitizing. Most systems display the digitized linework as it is being digitized on an accompanying graphics terminal.

Most GIS's use a spaghetti mode of digitizing. This allows the user to simply digitize lines by indicating a start point and an end point. Data can be captured in point or stream mode. However, some systems do allow the user to capture the data in an arc/node topological data structure. The arc/node data structure requires that the digitizer identify nodes.

Data capture in an arc/node approach helps to build a topologic data structure immediately. This lessens the amount of post processing required to clean and build the topological definitions. However, most often digitizing with an arc/node approach does not negate the requirement for editing and cleaning of the digitized linework before a complete topological structure can be obtained.

The building of topology is primarily a post-digitizing process that is commonly executed in batch mode after data has been cleaned. To date, only a few commercial vector GIS software offerings have successfully exhibited the capability to build topology interactively while the user digitizes.

Manual digitizing has many advantages. These include:

Low capital cost, e.g. digitizing tables are cheap;
Low cost of labour;
Flexibility and adaptability to different data types and sources;
Easily taught in a short amount of time - an easily mastered skill
Generally the quality of data is high;
Digitizing devices are very reliable and most often offer a greater precision that the data warrants; and
Ability to easily register and update existing data.


For raster based GIS software data is still commonly digitized in a vector format and converted to a raster structure after the building of a clean topological structure. The procedure usually differs minimally from vector based software digitizing, other than some raster systems allow the user to define the resolution size of the grid-cell. Conversion to the raster structure may occur on-the-fly or afterwards as a separate conversion process.

Automatic Scanning

A variety of scanning devices exist for the automatic capture of spatial data. While several different technical approaches exist in scanning technology, all have the advantage of being able to capture spatial features from a map at a rapid rate of speed. However, as of yet, scanning has not proven to be a viable alternative for most GIS implementation. Scanners are generally expensive to acquire and operate. As well, most scanning devices have limitations with respect to the capture of selected features, e.g. text and symbol recognition. Experience has shown that most scanned data requires a substantial amount of manual editing to create a clean data layer. Given these basic constraints some other practical limitations of scanners should be identified. These include :

hard copy maps are often unable to be removed to where a scanning device is available, e.g. most companies or agencies cannot afford their own scanning device and therefore must send their maps to a private firm for scanning;
hard copy data may not be in a form that is viable for effective scanning, e.g. maps are of poor quality, or are in poor condition;
geographic features may be too few on a single map to make it practical, cost-justifiable, to scan;
often on busy maps a scanner may be unable to distinguish the features to be captured from the surrounding graphic information, e.g. dense contours with labels;
with raster scanning there it is difficult to read unique labels (text) for a geographic feature effectively; and
scanning is much more expensive than manual digitizing, considering all the cost/performance issues.


Consensus within the GIS community indicates that scanners work best when the information on a map is kept very clean, very simple, and uncluttered with graphic symbology.

The sheer cost of scanning usually eliminates the possibility of using scanning methods for data capture in most GIS implementations. Large data capture shops and government agencies are those most likely to be using scanning technology.

Currently, general consensus is that the quality of data captured from scanning devices is not substantial enough to justify the cost of using scanning technology. However, major breakthroughs are being made in the field, with scanning techniques and with capabilities to automatically clean and prepare scanned data for topological encoding. These include a variety of line following and text recognition techniques. Users should be aware that this technology has great potential in the years to come, particularly for larger GIS installations.

Coordinate Geometry

A third technique for the input of spatial data involves the calculation and entry of coordinates using coordinate geometry (COGO) procedures. This involves entering, from survey data, the explicit measurement of features from some known monument. This input technique is obviously very costly and labour intensive. In fact, it is rarely used for natural resource applications in GIS. This method is useful for creating very precise cartographic definitions of property, and accordingly is more appropriate for land records management at the cadastral or municipal scale.

Conversion of Existing Digital Data

A fourth technique that is becoming increasingly popular for data input is the conversion of existing digital data. A variety of spatial data, including digital maps, are openly available from a wide range of government and private sources. The most common digital data to be used in a GIS is data from CAD systems. A number of data conversion programs exist, mostly from GIS software vendors, to transform data from CAD formats to a raster or topological GIS data format. Several ad hoc standards for data exchange have been established in the market place. These are supplemented by a number of government distribution formats that have been developed. Given the wide variety of data formats that exist, most GIS vendors have developed and provide data exchange/conversion software to go from their format to those considered common in the market place.

Most GIS software vendors also provide an ASCII data exchange format specific to their product, and a programming subroutine library that will allow users to write their own data conversion routines to fulfil their own specific needs. As digital data becomes more readily available this capability becomes a necessity for any GIS. Data conversion from existing digital data is not a problem for most technical persons in the GIS field. However, for smaller GIS installations who have limited access to a GIS analyst this can be a major stumbling block in getting a GIS operational. Government agencies are usually a good source for technical information on data conversion requirements.

Some of the data formats common to the GIS marketplace are listed below. Please note that most formats are only utilized for graphic data. Attribute data is usually handled as ASCII text files. Vendor names are supplied where appropriate.

IGDS - Interactive Graphics Design Software (Intergraph / Microstation)

This binary format is a standard in the turnkey CAD market and has become a de facto standard in Canada's mapping industry. It is a proprietary format, however most GIS software vendors provide DGN translators.

DLG - Digital Line Graph (US Geological Survey)

This ASCII format is used by the USGS as a distribution standard and consequently is well utilized in the United States. It is not used very much in Canada even though most software vendors provide two way conversion to DLG.

DXF - Drawing Exchange Format (Autocad)

This ASCII format is used primarily to convert to/from the Autocad drawing format and is a standard in the engineering discipline. Most GIS software vendors provide a DXF translator.

GENERATE - ARC/INFO Graphic Exchange Format

A generic ASCII format for spatial data used by the ARC/INFO software to accommodate generic spatial data.

EXPORT - ARC/INFO Export Format .

An exchange format that includes both graphic and attribute data. This format is intended for transferring ARC/INFO data from one hardware platform, or site, to another. It is also often used for archiving.

ARC/INFO data. This is not a published data format, however some GIS and desktop mapping vendors provide translators. EXPORT format can come in either uncompressed, partially compressed, or fully compressed format



A wide variety of other vendor specific data formats exist within the mapping and GIS industry. In particular, most GIS software vendors have their own proprietary formats. However, almost all provide data conversion to/from the above formats. As well, most GIS software vendors will develop data conversion programs dependant on specific requests by customers. Potential purchasers of commercial GIS packages should determine and clearly identify their data conversion needs, prior to purchase, to the software vendor.