Note: Avoid my blabbermouth by skipping ahead to the "Summary: Advanced Python Development" section at the bottom of the page
In my last article, I described how to automate the an ArcGIS for Desktop installation using basic Python and a few freely available Python libraries. In that article's introduction, I noted that several prerequisites should be fulfilled before attempting to write this script. This includes an installation and knowledge of pip, virtualenv, and an IDE. For someone who has only ever written a few geoprocessing scripts, this is quite a lot of overhead just to try something out!
On the other hand, once these prerequisites are fulfilled, they’ll make your life a WHOLE LOT easier when creating more complex Python scripts. If your whole program consists of a single file and a handful of modules, this is no big deal. But if it expands to multiple files and dozens of modules (and might be expected to work reliably for the next decade), having additional tools to keep track of everything is invaluable.
In this article, I will lay out the steps for installing and using these technologies — thereby speeding up your development in the future and making your scripts as robust as they can be. Your end goal after reading this article is to be able to set up the correct environment for advanced Python development.
I’ll start off with the Python language itself, then move on to pip and virtualenv – two packages fulfilling essential purposes in Python development. I’ll finish with an introduction to the PyCharm IDE, and how to integrate it into the rest of these technologies.
Not surprisingly, the foundation of setting up a development environment for Python is the Python language itself! However, a quick glance at the download page brings one question to the forefront: what version should we be using?
At the time of this writing, the latest stable releases are Python 2.7.13 and Python 3.6.0. The differences between Python 2 and Python 3 are rather arcane for the average developer, and the reasons for the split are even more so. But the important thing to realize is that Python 3 is not backwards compatible. This means that if you try to run Python 2 code as Python 3, it will probably break.
Similarly, if you try to run Python 3 as Python 2, it will probably break (I did consider using “might” instead of “probably” here, but since we’re writing code, Murphy’s Law is in effect – programs tend to break even when everything is written in the same language). Using different versions of Python together will require doing something hacky (like piping), or writing code that is compatible with both versions.
Indeed, if you have ArcGIS 10 then you probably also have some iteration of Python 2.7 installed on your machine. If you want to get your environment set up as quickly as possible, you can simply use that.
On the other hand, Python 3 is the future – and not just in a “it has a higher number” sense. ESRI is supporting the transition to Python 3 by making it the language ArcGIS Pro’s ArcPy, and thus Utility Network, will use.
Whichever version you decide on, installation consists of downloading the installation executable from the Python downloads site, running the executable, and clicking through the installation screens. If you already have a Python version installed (like 2.7 from your ArcGIS installation), installing additional versions should not cause conflicts, as the installations are completely independent of each other.
The great thing about programming is that you should only ever need to write anything once. After that, you can just reuse the code an infinite number of times.
The even greater thing about programming is that thanks to the Internet, you often don’t even need to write it the first time! Someone else already wrote it, and they did a better job at it than you would have anyway, so you should just use that.
In advanced Python development, this code is distributed in packages that can be added into your Python installation. But finding the code, making sure it is the correct version, downloading it, unzipping it, and relocating it to the proper directory on your computer is error-prone and tedious. That’s why there’s pip.
See, pip makes it fast and easy to get new packages up and running in Python. With pip, installing and uninstalling packages for python can be done with a single line in the command prompt.
The not-so-great thing about programming is that everything is always getting better. Admittedly, this isn’t the worst problem to have, but it does bring up some issues. Usually dependency issues.
Say you need a piece of code that depends on a package. You download the package, write the code, do some testing, and find everything to be hunky-dory. Your code runs happily every time you use it, and unicorns and butterflies abound. Then, several years later, you decide you need a new piece of code that depends on the same package.
In the intervening years there have been a number of updates to the package, adding new functionality that you’d like to utilize. Thinking nothing of it, you download the new package only to see the unicorns and butterflies flee in terror as your old faithful code starts vomiting everywhere. That’s why there’s virtualenv. virtualenv makes it easy to keep the packages for one project separate from the packages for another.
To understand how virtualenv works, you need to understand how Python works.
Whenever you start up Python, (eg, calling “Python” in the command prompt, starting the IDLE interpreter, running a Python script, etc), one of the first things it does is find The PATH. Python’s PATH is where it is going to look for - well – everything. Any packages you want to use, and any of the standard files that define basic Python functionality (really, what makes Python, Python) must be found in the PATH, or Python will be unable to use them. (As for PATH itself – it is really just a list of folders to look in – eg, “C:\;C:\Users\;C:\Python27\;”)
Working with virtualenv comes in two steps. In the first step you make a new virtual environment for Python to run in, and in the second you activate that environment to actually run the Python.
When you make a new environment, virtualenv designates a certain folder as “the environment.” Then it sets up a few inner folders, and copies in the necessary files to make a standalone Python instance. You’ll be able to install all the packages you want for your project into this directory, where they won’t be interfered with by the packages of other environments.
When you activate the environment, virtualenv will tell all calls to Python that the PATH points over here, where here is the environment’s directory. Since Python (and pip) look for packages exclusively where the PATH directs them, your environment is now isolated from any other Python code that might be on your machine.
If you’re new to Python and advanced Python development, chances are you are familiar with IDLE.
IDLE is Python’s built-in Integrated Development Environment, or IDE.
An IDE is a bit like a word processor (like Microsoft Word), but for computer languages rather than human ones – a word processor might correct your spelling, have tools for document formatting, and allow you to send your document to a printer, while an IDE can correct your syntax errors, have tools for debugging, and allow you to compile your code.
In this sense IDLE is a bit like WordPad – minimally featured and lightweight; good for quick one-offs, but not what you’d want for making an especially fancy document. That’s why there’s PyCharm. PyCharm combines just about all the things you’ll need for Python development into one place so you don’t have to run around collecting them all and making them work together. Lo and behold, PyCharm will come bundled with pip and virtualenv so you don’t even need to install them yourself! You are one step closer to advanced Python development.
(For completeness, I should also mention that there are quite a few other IDEs out there specifically for Python, many more that can be pressed into service as Python IDEs given the right extensions, and a plethora of text editors with Python-specific extensions that can be used as alternatives to PyCharm. I just think PyCharm is nice, and happen to be most familiar with it.)
PyCharm installation is pretty straight forward. Simply download the executable, run the executable, and click through the installation dialogue.
Normally, Python scripts are rather free-form. If a file is in the PATH, you can use it in any Python script you want to execute. While this provides a lot of flexibility when you need it, most of the time a lack of structure can be rather burdensome. It’s no fun to open up a script you wrote several years ago, see a list of import statements, and think “where the heck did I put those files?”
PyCharm resolves this issue by adding the concept of “projects.” This isn’t a particularly new or innovative idea – it’s used in just about every programming language – which is excellent evidence that it is a good idea.
When a new project is created, PyCharm will generate a file named “.idea” in the folder you specify. The .idea file will contain whatever metadata PyCharm needs to store for the project, such as which files and directories are included so it can display them for your convenience.
Each project that you create allows you to define the interpreter you will use. Since virtualenv creates a new python.exe file for each environment it creates, and python.exe defines the interpreter you’re using, pointing PyCharm at the python.exe file in your virtual environment will isolate the PyCharm project from the rest of your system. PyCharm even has some built in functionality to help you do this!
To start, open up PyCharm and create a new project. (You can also perform the following steps by clicking on the cog next to the “interpreter” box on this screen, but we’ll ignore that for now.)
Then, open the Settings window (ctrl-alt-s). Type “interpreter” in the search bar in the upper left hand corner of the window, which should automatically navigate you to the Project Interpreter screen.
Click on the gear icon at the end of the line that says “Project Interpreter,” and a context menu will appear.
If you’ve already created a virtual environment on your machine that you would like to use, you can select “Add Local,” and locate the associated python.exe file in the window that pops up. Otherwise, you can select “Create VirtualEnv” and follow the instructions on the pop up window to create a new virtual environment for your project.
Now anything you do with your project will be associated with the environment you selected. Any time you choose to debug your project, PyCharm will activate the environment and use the environment’s interpreter. And whenever you install packages for the project in PyCharm, they’ll be installed only in the environment. Speaking of which …
You might have noticed when you were setting your virtualenv that the same window had a table listing packages:
Easy as installing packages with pip in the command prompt is, PyCharm adds a GUI to make it even easier.
To add a new package, simply click the “plus” symbol next to the table, and the “Available Packages” window will appear. Perhaps we want to make some visualizations of our data using Python, in which case we’ll probably want the matplotlib package. So we can type “matplotlib” in the search bar, and the available package will appear in the list below.
Then click the “Install Package” button and wait for the package to install. The package will install in your specified virtual environment, and will be available to use in any file you create for your project.
Unfortunately, the current version of PyCharm as of this writing (2016.3.2) comes with an outdated version of pip. Luckily, the fix is quick and easy – updating pip! In the Project Interpreter screen, highlight the row in the packages table that says “pip.” The other two columns should inform you that you are not using the latest version.
Click the up arrow next to the table, and PyCharm will begin upgrading pip. Restart PyCharm, and you should be able to install packages to your heart’s content.
And that’s about it. Happy developing!
Setting up your advanced Python development environment should include:
You can use the version of Python that comes with ArcGIS for Desktop (likely 2.7) or install a later version (latest right now is 3.6). You should be able to install them side-by-side without issue.
PyCharm comes packaged with pip and virtualenv, so that’s the only other thing you need to install.