Let’s Make a Bar Chart, II

The previous part of this tutorial covered making a basic bar chart in HTML; in this part, we’ll extend the example bar chart using Scalable Vector Graphics (SVG) and make it more realistic by loading an external data file in tab-separated values (TSV) format.

#Introducing SVG

Whereas HTML is largely limited to rectangular shapes, SVG supports powerful drawing primitives like Bézier curves, gradients, clipping and masks. We won’t need all of SVG’s extensive feature set for a lowly bar chart, but learning SVG is a worthwhile addition to your visual lexicon when it comes to designing visualizations.

Like anything, this richness necessarily comes at a cost. The large SVG specification may be intimidating, but remember that you don’t need to master every feature to get started. Browsing examples is an enjoyable way to pick up new techniques.

And despite obvious differences, SVG and HTML share many similarities. You can write SVG markup and embed it directly in a web page (provided you use <!DOCTYPE html>). You can inspect SVG elements in your browser’s developer tools. And SVG elements can be styled with CSS, albeit using different property names like fill instead of background-color. However, unlike HTML, SVG elements must be positioned relative to the top-left corner of the container; SVG does not support flow layout or even text wrapping.

#Coding a Chart, Manually

Before we construct the new chart using JavaScript, let’s revisit the static specification in SVG.

<!DOCTYPE html>
<style>

.chart rect {
  fill: steelblue;
}

.chart text {
  fill: white;
  font: 10px sans-serif;
  text-anchor: end;
}

</style>
<svg class="chart" width="420" height="120">
  <g transform="translate(0,0)">
    <rect width="40" height="19"></rect>
    <text x="37" y="9.5" dy=".35em">4</text>
  </g>
  <g transform="translate(0,20)">
    <rect width="80" height="19"></rect>
    <text x="77" y="9.5" dy=".35em">8</text>
  </g>
  <g transform="translate(0,40)">
    <rect width="150" height="19"></rect>
    <text x="147" y="9.5" dy=".35em">15</text>
  </g>
  <g transform="translate(0,60)">
    <rect width="160" height="19"></rect>
    <text x="157" y="9.5" dy=".35em">16</text>
  </g>
  <g transform="translate(0,80)">
    <rect width="230" height="19"></rect>
    <text x="227" y="9.5" dy=".35em">23</text>
  </g>
  <g transform="translate(0,100)">
    <rect width="420" height="19"></rect>
    <text x="417" y="9.5" dy=".35em">42</text>
  </g>
</svg>

As before, a stylesheet applies colors and other aesthetic properties to the SVG elements. But unlike the div elements that were implicitly positioned using flow layout, the SVG elements must be absolutely positioned with hard-coded translations relative to the origin.

A common point of confusion in SVG is distinguishing between properties that must be specified as attributes and properties that can be set as styles. The full list of styling properties is documented in the specification, but a quick rule of thumb is that geometry (such as the width of a rect element) must be specified as attributes, while aesthetics (such as a fill color) can be specified as styles. While you can use attributes for anything, I recommend you prefer styles for aesthetics; this ensures any inline styles play nicely with cascading stylesheets.

SVG requires text to be placed explicitly in text elements. Since text elements do not support padding or margins, the text position must be offset by three pixels from the end of the bar, while the dy offset is used to center the text vertically.

4 8 15 16 23 42

Despite its very different specification, the resulting chart is identical to the previous one.

#Coding a Chart, Automatically

Next let’s construct the chart using D3. By now, parts of this code should look familiar:

<!DOCTYPE html>
<meta charset="utf-8">
<style>

.chart rect {
  fill: steelblue;
}

.chart text {
  fill: white;
  font: 10px sans-serif;
  text-anchor: end;
}

</style>
<svg class="chart"></svg>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script>

var data = [4, 8, 15, 16, 23, 42];

var width = 420,
    barHeight = 20;

var x = d3.scale.linear()
    .domain([0, d3.max(data)])
    .range([0, width]);

var chart = d3.select(".chart")
    .attr("width", width)
    .attr("height", barHeight * data.length);

var bar = chart.selectAll("g")
    .data(data)
  .enter().append("g")
    .attr("transform", function(d, i) { return "translate(0," + i * barHeight + ")"; });

bar.append("rect")
    .attr("width", x)
    .attr("height", barHeight - 1);

bar.append("text")
    .attr("x", function(d) { return x(d) - 3; })
    .attr("y", barHeight / 2)
    .attr("dy", ".35em")
    .text(function(d) { return d; });

</script>

We set the svg element’s size in JavaScript so that we can compute the height based on the size of the dataset (data.length). This way, the size is based on the height of each bar rather than the overall height of the chart, and we ensure adequate room for labels.

Each bar consists of a g element which in turn contains a rect and a text. We use a data join (an enter selection) to create a g element for each data point. We then translate the g element vertically, creating a local origin for positioning the bar and its associated label.

Since there is exactly one rect and one text element per g element, we can append these elements directly to the g, without needing additional data joins. Data joins are only needed when creating a variable number of children based on data; here we are appending just one child per parent. The appended rects and texts inherit data from their parent g element, and thus we can use data to compute the bar width and label position.

#Loading Data

Let’s make this chart more realistic by extracting the dataset into a separate file. An external data file separates the chart implementation from its data, making it easier to reuse the implementation on multiple datasets or even live data that changes over time.

Tab-separated values (TSV) is a convenient tabular data format. This format can be exported from Microsoft Excel and other spreadsheet programs, or authored by hand in a text editor. Each line represents a table row, where each row consists of multiple columns separated by tabs. The first line is the header row and specifies the column names. Whereas before our dataset was a simple array of numbers, now we’ll add a descriptive name column. Our data file now looks like this:

name	value
Locke	4
Reyes	8
Ford	15
Jarrah	16
Shephard	23
Kwon	42

To use this data in a web browser, we need to download the file from a web server and then parse it, which converts the text of the file into usable JavaScript objects. Fortunately, these two tasks can be performed by a single function, d3.tsv.

Loading data introduces a new complexity: downloads are asynchronous. When you call d3.tsv, it returns immediately while the file downloads in the background. At some point in the future when the download finishes, your callback function is invoked with the new data, or an error if the download failed. In effect your code is evaluated out of order:

// 1. Code here runs first, before the download starts.

d3.tsv("data.tsv", function(error, data) {
  // 3. Code here runs last, after the download finishes.
});

// 2. Code here runs second, while the file is downloading.

Thus we need to separate the chart implementation into two phases. First, we initialize as much as we can when the page loads and before the data is available. It’s good to set the chart size when the page loads, so that the page does not reflow after the data downloads. Second, we complete the remainder of the chart inside the callback function.

Restructuring the code:

<!DOCTYPE html>
<meta charset="utf-8">
<style>

.chart rect {
  fill: steelblue;
}

.chart text {
  fill: white;
  font: 10px sans-serif;
  text-anchor: end;
}

</style>
<svg class="chart"></svg>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script>

var width = 420,
    barHeight = 20;

var x = d3.scale.linear()
    .range([0, width]);

var chart = d3.select(".chart")
    .attr("width", width);

d3.tsv("data.tsv", type, function(error, data) {
  x.domain([0, d3.max(data, function(d) { return d.value; })]);

  chart.attr("height", barHeight * data.length);

  var bar = chart.selectAll("g")
      .data(data)
    .enter().append("g")
      .attr("transform", function(d, i) { return "translate(0," + i * barHeight + ")"; });

  bar.append("rect")
      .attr("width", function(d) { return x(d.value); })
      .attr("height", barHeight - 1);

  bar.append("text")
      .attr("x", function(d) { return x(d.value) - 3; })
      .attr("y", barHeight / 2)
      .attr("dy", ".35em")
      .text(function(d) { return d.value; });
});

function type(d) {
  d.value = +d.value; // coerce to number
  return d;
}

</script>

So, what changed? Although we declared the x-scale in the same place as before, we can’t define the domain until the data is loaded, because the domain depends on the maximum value. Thus, the domain is set inside the callback function. Likewise, although the width of the chart can be set statically, the height of the chart depends on the number of bars and thus must be set in the callback function.

Now that our dataset contains both names and values, we must refer to the value as d.value rather than d; each data point is an object rather than a number. The equivalent representation in JavaScript would look like this:

var data = [
  {name: "Locke",    value:  4},
  {name: "Reyes",    value:  8},
  {name: "Ford",     value: 15},
  {name: "Jarrah",   value: 16},
  {name: "Shephard", value: 23},
  {name: "Kwon",     value: 42}
];

Any place in the old chart implementation we referred to d must now refer to d.value. In particular, whereas before we could pass the scale x to compute the width of the bar, we must now specify a function that passes the data value to the scale: function(d) { return x(d.value); }. Likewise, when computing the maximum value from our dataset, we must pass an accessor function to d3.max that tells it how to evaluate each data point.

Here’s one more gotcha with external data: types! The name column contains strings while the value column contains numbers. Unfortunately, d3.tsv isn’t smart enough to detect and convert types automatically. Instead, we specify a type function that is passed as the second argument to d3.tsv. This type conversion function can modify the data object representing each row, modifying or converting it to a more suitable representation:

function type(d) {
  d.value = +d.value; // coerce to number
  return d;
}

Type conversion isn’t strictly required, but it’s an awfully good idea. By default, all columns in TSV and CSV files are strings. If you forget to convert strings to numbers, then JavaScript may not do what you expect, say returning "12" for "1" + "2" rather than 3. Similarly, if you sort strings rather than numbers, the lexicographic behavior of d3.max may surprise you!

Next: Part 3

The code for part 2 of the tutorial is available at bl.ocks.org/7341714.

While our bar chart may not be any more impressive than the bare-bones chart we created previously, this tutorial introduced SVG and external data, two essential topics for any real-world visualization. And, we’re now much better positioned to complete this chart. The next tutorial in this series covers axes and chart styling.