How many times have you been given data sets from several different sources and tasked with compiling them into a single GIS product? Yeah, probably too many to count. I recall receiving utility field asset data in the following formats for the same project: KML, Trimble survey file, Excel, shapefile, and everyone’s favorite – handwritten field notes with X/Ys from a Garmin GPS!
Utility migrations are unlikely to include field notes, but the concept remains the same. Often, utility migrations consist of numerous “silos” databases that reflect various independently managed regions, rather than just one centralized database. Migrations aren’t just for switching to a new utility model; they’re also a way to bring together utility databases with vastly different data models and schemas.
What domains are there, and how should codes that differ between schemas but represent the same asset be handled? “UNK” is represented completely different between database schemas; what should be truly “UNK” and what should be NULL? One database uses a coded and another uses a ranged domain, what now? Should we merge schemas or pick one and force others to comply?
FME (Feature Manipulation Engine) Workbench is one tool that I’ve found particularly beneficial in providing a starting point for providing answers to these questions. What is FME? I always like to describe it as Esri’s Model Builder on steroids. FME is an ETL tool that uses a drag-and-drop interface to allow users to create complex, repeatable workflows without writing code. Spatial data can be combined with non-spatial data, filtered, analyzed, and then converted to just about any format you can imagine. I’ll go through this in more detail in the example below. For detailed info – #how-it-works.
Let’s focus on using our FME workbench to pull domain values/counts along with other useful statistics. The goal is to try to solve the following items related to domains:
- Retrieve a list of all domains from each geodatabase
- Extract the domain code, and the domain description
- Associate the domain with the field and feature class (FC)
- Generate attribute domain counts per domain/FC
For, let’s focus on the sample schema below:
Note that there are two databases both have two different FCs – all with different domains.
Below is the sample data for each FC:
Note that each FC has different Domain names, codes, and record counts. All have a coded domain with string format, except for Fitting.
Below is an overview of how the workflow is structured. For a detailed look of the workbench click here FME_SnapShot.pdf
We begin by bringing in the source data, then add transformers to begin transforming the data. When data is read in FME, there is an option of caching it at each step of the process. View the cached results by clicking any of the icons . The ability to see results immediately after a query provides immediate confirmation that you have either set the query correctly or need to re-evaluate the parameters. Whenever I design a workflow in FME, I always try to keep repeatability in mind. You can easily plug in other datasets with little to no modification to your workbench if parameters are set and kept dynamic.
Within FME, there are a plethora of different methods, transformers, and workflows I could have used to accomplish this. Likely there are other FME solutions that could complete this process in a more efficient manner. However, I find it easier to create the solution and not get hung up with it being perfect. Then, if needed, refine the solution for better performance. FME continues to evolve with each new release.
Below is the output result we get from the workbench:
We now have a list of all domains that are unique to each Database/FC/attribute. A large portion of the workbench is dedicated to logically breaking down data. We can build the frequency for each domain value by exploding the attributes. As you can see, for example, Cast Iron is represented in a variety of codes. When source data domains use integers to represent strings, the ability to combine the code and description is critical.
Does this output provide an answer to all questions when merging domains/schemas? Not by a long shot. However, it can help provide a starting point to answering common domain handling questions. By using this spreadsheet, we get a comprehensive list of all Materials from both databases. If we were to decide to unify all pipeline material domains into a single domain, you would now easily be able to build a complete list without any guessing.