Using Python for Data Project Setup and Data Cleanup

August 1, 2019 — Ben Reilly

While Python is commonly used in GIS to develop tools and automate processes, it can also be incredibly useful in another aspect of data handling: raw data processing. Data in the form of spreadsheets or other text files can be cleaned up, sorted, modified, and combined into a single source to streamline data projects. Source processing for data projects can take weeks or months if performed manually; using Python has clear cost savings in both time and hours spent.

For brief background, I am a relative beginner in Python scripting. However, within the course of a month, I was able to learn enough to write a functioning tool that can sort through dozens of spreadsheets, find each line of data that matches a specified format, and combine it all into a single spreadsheet that can be imported into a database or, with coordinate information, directly into a feature class or shapefile in GIS. This tool takes only minutes to run, as opposed to hours and hours of manual effort.

The tool searches through a specified folder or directory and looks for any file that is a spreadsheet. When it finds a spreadsheet, it opens the file and looks at the names of each sheet. Specific sheet names can be skipped; for example, if a sheet title involves the word “Miscellaneous” or “All” it can be skipped as it contains unrelated or duplicate information.

When a worksheet is found that may contain project information, the tool reads every column heading and looks for any headings that are in a specified list; this list contains headings to record the data from, as well as headings to change. For example, the tool can change “Xfmrs” to “Transformers” to keep headings uniform across spreadsheets.

Read more

GIS  Python

We Wrote the Book

The Indispensible Guide to ArcGIS Online

Download It for Free

Ben Reilly

Senior GIS Specialist

What do you think?

Leave a comment, and share your thoughts

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


This site uses Akismet to reduce spam. Learn how your comment data is processed.