Updating Data in an Enterprise GIS with Python and Pandas

A Data Pipeline is Born

Towards the beginning of the year, I wanted to play with the new selection-based display in the ArcGIS Dashboard beta (which is now out of beta!). I had been hoping for that selection-based display for quite some time and wanted to get my hands on it to see it in action. The selection-based display functionality worked exactly how I had hoped; kudos to the team that built that; I love it. But that’s not what this blog post is about! I’m going to tell you about the mechanism I built to load the data into that dashboard.

First, there are some things you should know:

I love GIS
Python is my go-to programming language for anything involving data
Pandas is the reason for #2
The ArcGIS Python API’s GeoAccessor makes combining #3 and #1 simple
I like new stuff

What I ended up doing was taking some tabular data that the national weather service publishes for free that had some point coordinates. I read that into a pandas dataframe, then used the GeoAccessor to create a proper point geometry object in the dataframe, and used the to_featurelayer() function to load the dataframe to ArcGIS Online. I built the dashboard on the layer and could have left well enough alone, but I wanted to see that data on my dashboard change, and I didn’t want to just delete all the records and reload them each time. I wanted to have the existing records updated when they had changes, new records added when needed, and old records deleted if they weren’t in the source data anymore. I set out to make that happen.

The Beginning

I took the code I had written to make the source dataframe that I loaded to AGOL and put it in a function. When the source data changed, I could call the function that returns the dataframe, and with one function call, have a fully operational dataframe. Then I sucked the layer from AGOL into a spatially enabled dataframe, once again with help from our friend, the GeoAccessor, and its from_layer() function. Once I had the two dataframes, I finally used something I had read about in Panda’s release documentation from July 2020, the .compare() function. When I read about it when it was new, it seemed useful, but I didn’t have a use case for it at the time. I did now! Using that compare function made finding the differences between the new dataframe simple. With the differences laid out, I was able to update the data in the layer on AGOL using the feature layer’s .edit_features() function. I was also able to use the same function to add records that weren’t in my layer on AGOL but were in the source dataframe, and likewise delete any records that were in the layer but were no longer in the source dataframe. All the CRUD was handled.

This Works Well… We Should Reuse It!

Once I got the kinks worked out, my next task was obvious. I needed to parameterize this code to make it configurable so we could use it elsewhere. I had to retool a bit of the code, add a bunch of logging and error handling, and some other niceties that make it more extensible, but I ended up with a python module that imports another python module that acts as a configuration file. Using the configurations, it forms a source dataframe of how the data should look in the GIS and then updates the data in a target feature layer on AGOL or Portal for ArcGIS to match the source dataframe.

In Action

Quickly after I demoed the code to some of my coworkers here, I got the opportunity to join a project where this exact code could be used. The client wanted to be able to see their data from an asset management system on a map but isn’t ready to start using a GIS to manage the data. I was able to build configurations that take the coordinates of poles and towers which are stored in the asset management system and build points from the coordinates, so they show up on the map. They also have good structure sequence data as well, so I was able to take that information and derive lines using the coordinates of the structures so they would show on the map. Additionally, they have information on the width of the right of way for each span, so I was able to use that to make right of way polygons between pairs of structures. There are also related non-spatial tables that we’re synchronizing too. And finally, we’re even getting line data and combining that with data from an outage management system so we can display outages in near real-time on an ArcGIS dashboard (how do you like that for coming full circle???).

How Long Does It Take?

When I describe this as a batch process that creates a dataframe from the source and compares it to another with all the records from the GIS and then compares every vertex on every feature and every value in every column, a lot of people hesitate and think it will take a long time to complete. That’s simply not the case. On a recent run with 142,000 records in a point feature class with 16 fields (plus geometry) on a standard processing power server, it took 11 min to create both dataframes, compare and update the 133 rows that had discrepancies. Depending on the number of vertices in points and polygons, they can take a bit longer to create the geometry and compare, but it is still incredibly fast in my opinion (150K polygons with 4 vertices each in less than 20 min). And if there is a speed concern, it can also work with subsets of data so it can run faster; however, we aren’t currently using that functionality in our project.

What’s Next?

While this module has proven itself as being great for batch integrations into a GIS, but I’d really like to see it used more for constructing a data pipe that combines data from their asset and work management systems with their GIS to enable our utility customers to easily visualize what’s happening in the field, more than just a point on the map that says there’s a work order there. Since we have the data in a pandas dataframe already, aggregating it and performing statistical calculations is a snap, if we write that information to a table in the GIS, we can easily show it on a dashboard and enable decision-makers to look at their issues spatially, with supporting quantitative analysis.

Empowered Solutions Blog

Sharing Our Insights on the Future of Utility Work

Updating Data in an Enterprise GIS with Python and Pandas

A Data Pipeline is Born

The Beginning

This Works Well… We Should Reuse It!

In Action

How Long Does It Take?

What’s Next?

Data Enterprise GIS Python SSP Blogs

We Wrote the Book

Aaron Leese

Associated Blog Posts

ArcFM™ and Python

Advanced Python Development: Setting Up Your Environment

Beefing Up & Reusing Python Geoprocessing Code

Feature Reports Using ArcGIS and Python

Fixing Python Scripts

What do you think?

Leave a comment, and share your thoughts Cancel Reply

Let’s Talk

Connect

Follow us on social media