My previous Let’s Make a Map tutorial describes how to make a basic map with D3 and TopoJSON; now it’s time to cover thematic mapping in the form of a proportional symbol map. The simplest symbol is a circle, or bubble, whose area is proportional to the associated data. In this tutorial, we’ll make a bubble map of population by U.S. county.
Source: American Community Survey, 2012 5-Year Estimate
This tutorial necessarily covers a lot of ground. The main tasks are for any visualization are:
There are many different ways to perform these tasks, but this tutorial will focus on my preferred workflow. After acquiring cartographic boundaries and population estimates from the U.S. Census Bureau, we’ll transform this data to TopoJSON and display it using D3. Lastly, I’ll briefly comment on effective design for visual communication.
At a minimum, you’ll need Node and a basic web server for making maps. I covered this previously, so I won’t repeat myself here.
Although not essential, I also recommend Git to keep a history of your changes, allowing you to revert mistakes (such as accidentally deleting hours of work). Create a new folder for this project, go to that folder in the terminal, and run the following command:
git init
I use NPM to define local dependencies. The benefit of this approach is that you can have multiple versions of software packages installed simultaneously, and you don’t have to worry about things breaking when you upgrade because each project is isolated. A minimal package definition is:
{
"name": "anonymous",
"private": true,
"version": "0.0.1",
"dependencies": {
"topojson": "1"
}
}
Save this to a file called package.json
, and run:
npm install
You should now see a node_modules
folder, containing the installed topojson
package.
If you’re using Git, you should also create a local .gitignore
file so that you don’t accidentally check-in generated files to the repository. It should look something like this:
.DS_Store
build
node_modules
The build
directory is where we’ll store our generated files. Because those files are generated, they don’t need to be saved in the Git repository — they can be rebuilt at any time.
The U.S. Census Bureau publishes simplified cartographic boundaries as shapefiles for thematic mapping. The Census Bureau also publishes TIGER/Line shapefiles that are higher resolution and more up-to-date; however, for the small scale map we are making, that extra resolution is not needed. County boundaries also don’t change very frequently, so it’s usually acceptable to use the decennial census rather than the most recent release.
We’ll be using the lowest-resolution shapefile, at “20m” or 1:20,000,000 scale. Rather than download the file and check it in to our git repo, we’ll use Make to document where this file is located and download it. Create a Makefile
with the following contents:
build/gz_2010_us_050_00_20m.zip:
mkdir -p $(dir $@)
curl -o $@ http://www2.census.gov/geo/tiger/GENZ2010/$(notdir $@)
Next, run:
make build/gz_2010_us_050_00_20m.zip
This will download the zipfile from the Census Bureau and save it in the build directory.
The zipfile by itself isn’t very useful. We need to unzip its contents and convert the contained shapefile into TopoJSON for web delivery. We could do this by hand, but we’ll again use Make so that our process is documented and repeatable. Add the following to the Makefile:
build/gz_2010_us_050_00_20m.shp: build/gz_2010_us_050_00_20m.zip
unzip -od $(dir $@) $<
touch $@
This rule unzips the previously-downloaded file, giving us shapefiles. But don’t run it yet — we can combine it with another rule to convert the shapefiles to TopoJSON:
build/counties.json: build/gz_2010_us_050_00_20m.shp
node_modules/.bin/topojson \
-o $@ \
--projection='width = 960, height = 600, d3.geo.albersUsa() \
.scale(1280) \
.translate([width / 2, height / 2])' \
--simplify=.5 \
-- counties=$<
Now run this new command:
make build/counties.json
In fact, this is not just converting the shapefile to TopoJSON, but also quantizing, projecting to the Albers USA projection and simplifying. Together, these changes save quite a bit of space! The resulting file is 496KB, while the original shapefile was 1.7MB.
Enough terminal. Time to get something on the screen. Create an index.html
:
<!DOCTYPE html>
<meta charset="utf-8">
<style>
path {
fill: none;
stroke: #000;
stroke-linejoin: round;
stroke-linecap: round;
}
</style>
<body>
<script src="//d3js.org/d3.v3.min.js" charset="utf-8"></script>
<script src="//d3js.org/topojson.v1.min.js"></script>
<script>
var width = 960,
height = 600;
var path = d3.geo.path()
.projection(null);
var svg = d3.select("body").append("svg")
.attr("width", width)
.attr("height", height);
d3.json("build/counties.json", function(error, us) {
if (error) return console.error(error);
svg.append("path")
.datum(topojson.mesh(us))
.attr("d", path);
});
</script>
Launch your local web server, and then visit your page. It should look something like this:
Two things to note at this stage. First, the d3.geo.path instance has a null projection; that’s because our TopoJSON is already projected, so we can display it as-is. This greatly improves rendering performance. Second, we’re just displaying the county boundaries so far (using topojson.mesh). We still have a bit of work to do before we can draw population bubbles.
The next task is to fetch the data we want to visualize: population estimates by county. Sometimes you may find that data conveniently baked into your shapefile, but here we’ll need to return to the U.S. Census Bureau and gather the requisite table from the American Community Survey (ACS) using the American FactFinder.
Here are the approximately twenty steps required to download a CSV:
If you would prefer this as a two-minute instructional video:
An eminently more usable alternative to FactFinder is censusreporter.org, a Knight News Challenge-funder project with a convenient autocomplete interface and a robust API. Here is a direct link to download the latest ACS total population estimate by county. Note, however, that the column headers for this CSV are slightly different than the ones from FactFinder: you must either edit the file or the Makefile rules accordingly.
If you want to experience the FactFinder vicariously, you may also instead download my copy. However, I recommend that you prefer data from primary sources whenever possible, as this ensures the data’s accuracy.
The downloaded ACS_12_5YR_B01003_with_ann.csv
is slightly unusual in that it contains two header lines. Normally, a CSV file only contains at most one header line defining the names of the columns; this is the format that d3.csv (and TopoJSON) expects. Open the downloaded CSV in your text editor and delete the first of the two header lines. The first few lines should look like this:
Id,Id2,Geography,Estimate; Total,Margin of Error; Total 0500000US01001,01001,"Autauga County, Alabama",54590,***** 0500000US01003,01003,"Baldwin County, Alabama",183226,***** 0500000US01005,01005,"Barbour County, Alabama",27469,*****
Now we can use TopoJSON’s --external-properties
feature to join the shapefile of counties with the CSV of population estimates, making additional properties available in the output TopoJSON. This flag works similar to a join in a relational database. Using the ID property as a primary key, we assign each row in the CSV file to the corresponding feature in the shapefile.
One frequent complication is that the external properties do not use the same ID property name as the shapefile. Here the CSV file uses the name Id2
, while the shapefile uses STATE
and COUNTY
. (We could use the longer Id
and GEO_ID
properties, but we’d prefer to use the shorter identifier here, without the redundant leading 0500000US
.)
To address these inconsistencies, the --id-property
argument accepts a comma-separated list of JavaScript expressions to specify how the ID property should be computed. For the shapefile, we’ll use the expression STATE+COUNTY
to concatenate those two properties, while for the CSV, we’ll use Id2
.
We can also use JavaScript expressions to define the properties we want to include in the generated TopoJSON. Here we’ll map the Geography
column from the CSV to the name
property, and the Estimate; Total
column to the population
property. The latter requires special syntax because the column name isn’t a valid JavaScript identifier. Also, we want it to be a number.
Modifying our Makefile slightly:
build/counties.json: build/gz_2010_us_050_00_20m.shp ACS_12_5YR_B01003_with_ann.csv
node_modules/.bin/topojson \
-o $@ \
--id-property='STATE+COUNTY,Id2' \
--external-properties=ACS_12_5YR_B01003_with_ann.csv \
--properties='name=Geography' \
--properties='population=+d.properties["Estimate; Total"]' \
--projection='width = 960, height = 600, d3.geo.albersUsa() \
.scale(1280) \
.translate([width / 2, height / 2])' \
--simplify=.5 \
-- counties=$<
One subtle detail you may not have noticed in the final bubble map is that it displays state boundaries rather than county boundaries. This reduces visual noise; each county has a corresponding bubble, while the state boundary lines provide additional geographic context.
We can compute the state boundaries without downloading another shapefile because TopoJSON is a topological format. The following rule merges (or “dissolves”) counties within the same state, producing a new states
layer in the output TopoJSON file:
build/states.json: build/counties.json
node_modules/.bin/topojson-merge \
-o $@ \
--in-object=counties \
--out-object=states \
--key='d.id.substring(0, 2)' \
-- $<
The resulting state mesh:
A similar rule can compute the national boundary by merging states:
us.json: build/states.json
node_modules/.bin/topojson-merge \
-o $@ \
--in-object=states \
--out-object=nation \
-- $<
To run these new rules:
make us.json
The topojson.merge function is part of the client API, so we could do this step in the client rather than baking it into the TopoJSON file. However, it’s slightly faster to precompute the merged areas, and sometimes it’s nice to have fewer moving parts.
Don’t forget to load the new file in index.html
, replacing the old counties-only file:
d3.json("us.json", function(error, us) {
if (error) return console.error(error);
// Append to svg here.
});
First, let’s finish the base map that will appear underneath the bubbles.
The relevant code for the base map is:
svg.append("path")
.datum(topojson.feature(us, us.objects.nation))
.attr("class", "land")
.attr("d", path);
svg.append("path")
.datum(topojson.mesh(us, us.objects.states, function(a, b) { return a !== b; }))
.attr("class", "border border--state")
.attr("d", path);
The land is drawn as a single feature, with the state borders drawn as white lines on top. The filter function passed to topojson.mesh specifies that only internal state borders should be drawn; the coastlines are not stroked so as to retain detail around small islands and inlets.
We’ll need these new styles, as well, replacing the old ones:
.land {
fill: #ddd;
}
.border {
fill: none;
stroke: #fff;
stroke-linejoin: round;
stroke-linecap: round;
}
Now to place bubbles at each county centroid:
svg.append("g")
.attr("class", "bubble")
.selectAll("circle")
.data(topojson.feature(us, us.objects.counties).features)
.enter().append("circle")
.attr("transform", function(d) { return "translate(" + path.centroid(d) + ")"; })
.attr("r", 1.5);
To size the bubbles, create a d3.scale.sqrt so that the area of the circle is proportional to the associated population; the radius of the circle is proportional to the square root of the population. (Alternatively, you could use d3.svg.symbol for other proportional symbols.) We could compute the domain of the scale from the data, but since we know the approximate distribution of the data beforehand, we can simply hard-code reasonable values:
var radius = d3.scale.sqrt()
.domain([0, 1e6])
.range([0, 15]);
This version of the map suffers greatly from occlusion: larger circles, such as Cook County in Illinois and Los Angeles County in California, obscure smaller bubbles underneath. Occlusion can be mitigated by making the bubbles smaller, but this makes it harder to see less-populated counties and emphasizes dense urban areas.
Another way to reduce occlusion is to sort bubbles by descending size, so that smaller bubbles are drawn on top of larger bubbles. The bubbles still overlap, but the smaller bubbles are now visible.
svg.append("g")
.attr("class", "bubble")
.selectAll("circle")
.data(topojson.feature(us, us.objects.counties).features
.sort(function(a, b) { return b.properties.population - a.properties.population; }))
.enter().append("circle")
.attr("transform", function(d) { return "translate(" + path.centroid(d) + ")"; })
.attr("r", function(d) { return radius(d.properties.population); });
A bit of transparency and thin white stroke also helps.
.bubble {
fill-opacity: .5;
stroke: #fff;
stroke-width: .5px;
}
Boom! A bubble map. But now that our map is legible, it’s a good time to consider its validity: often our source data is not as clean and regular as we expect, and data-cleanliness issues may not be apparent in the visualization. It’s critical to spot-check data and verify that it’s correct. You should run sanity checks on the data, such as whether any counties are duplicated or missing data.
For example, an earlier version of this tutorial used county boundaries from a different source, and the shapefile specified separate features for each of a county’s discontiguous areas. (Honolulu County in Hawaii consists not only of Oahu, but the tiny Ford and Sand islands as well.) To avoid duplicate bubbles and misleading readers, you would need to group features by county! The shapefile from the U.S. Census Bureau is already grouped, so we could skip this step.
To make this map communicate rather than simply look pretty, we need a few administrative touches. Adding a basic tooltip using SVG’s title element is a reasonable improvement, but we really need a legend to make the meaning of the area encoding is apparent. Here is a basic legend that displays three circles and their associated population sizes:
var legend = svg.append("g")
.attr("class", "legend")
.attr("transform", "translate(" + (width - 50) + "," + (height - 20) + ")")
.selectAll("g")
.data([1e6, 3e6, 6e6])
.enter().append("g");
legend.append("circle")
.attr("cy", function(d) { return -radius(d); })
.attr("r", radius);
legend.append("text")
.attr("y", function(d) { return -2 * radius(d); })
.attr("dy", "1.3em")
.text(d3.format(".1s"));
And the corresponding styles:
.legend circle {
fill: none;
stroke: #ccc;
}
.legend text {
fill: #777;
font: 10px sans-serif;
text-anchor: middle;
}
An alternative to the explicit legend is to annotate a few circles with their exact value — say, Los Angeles, Miami-Dade, and Cook. These values can then serve as comparison points for the other value, rather than needing additional visual elements.
Lastly, a wide variety of interactive improvements could be made, such as custom tooltip that displays additional information and the county outline, or panning and zooming to allow the viewer to dive in for more detail. You might also consider a Voronoi overlay to make the counties with small populations easier to hover. This tutorial merely provides a basic starting point for an interactive graduated symbol map.