Graph manipulation

Nacistická propaganda za druhé světové války rozšířila o britském ministerském předsedovi Wilsonu Churchillovi tvrzení, že údajně řekl: „Nevěřím žádné statistice, kterou jsem sám nezfalšoval.“ Toto tvrzení mělo ukazovat, že lže vlastním obyvatelům a není vůbec hodnověrným partnerem do diskuse. Přesto, že jde o propagandistickou větu, kterou nikdy nepronesl, je v ní jistě určitá dávka pravdy.

Everyone who has ever worked with statistics or data processing knows that we manipulate – in the good and bad sense of the word – in these areas almost always. We always want to prove something and we’re interested in another; we always choose one aspect and leave the others behind. Then there are manipulations that are intentional, and their goal is to confuse those reading the statistics or, in our case, a graph. Now we’ll take a look at some techniques used to do this (and what to look out for and avoid).

A very nice overview of the most common manipulations including examples is available in an older article on iDnes.

The truncated Y axis is probably the most common manipulative graph technique used in politics. How does it work? By not setting the beginning of the axis to zero, but to a number that points to a subtly lower value than the lowest one on the graph. The result is that you’re quite easily able to extremely increase the differences between individual data in the bar graph. This gives readers the feeling that certain data is dramatically dropping or rising, which does not necessarily have to be true. Truncated Y axes can be used, but it is always necessary to label this information in the graph.

All’s calm vs. steep growth? A truncated axis can distinctly change your impression of a graph. You can find more examples of graph distortion in this nice overview. Office.lasakovi.com - Zkreslující (chybné) grafy v Excel.

A logarithmic scale for an axis is handy if you need to enter very different values of data into a single graph. For example, this is used constantly in astrophysics. The problem occurs when you don’t notify your reader of such a scale – by doing so, you can easily turn linear growth into exponential growth.

Based on the first two examples, a situation arises in which we depict data with units that are logarithmic in and of themselves – for example decibels. Raising volume by 20 dB corresponds to a hundredfold change in intensity! The question is then – how do we work with such graphs? The most commonly used logarithm (the “decadic” logarithm) causes the distance between points 1 and 10 to be the same as between 10 and 100 or 100 and 1000.

Instead of a truncated Y axis, you can also find another interesting manipulation, i.e. the omission of various values on the X axis. This may be intentional (for example if we don’t like some of the values) or simply because they are not available to us. In such a case, however, there should be an empty space in the graph and not an axis that doesn’t contain a scale.

For a graph to be a graph, its axes must be marked and it must clearly state what is what. This calls for a good legend and a description of the axes. If the pressure to do this from your physics teachers seems unwarranted and useless, you should remember that it’s one of the basic characteristics of a graph. Without legends, you have no idea about what you’re looking at. Sometimes we can even encounter situations in which varying scales are used for the “two lines” in a graph, making the information into anything you could imagine. A part of the axis’s description should be information on what it’s depicting and in what units. We write these units into square brackets like [m] or [kg].

3D graphs – don’t use 3D graphs in any situation. Their three-dimensional effect is very pleasing, but in reality this can lead to absolute graph manipulation. Although it might look good in PowerPoint, it usually has zero informational value.

Something else that happens usually more often by mistake than intentionally is the selection of the wrong type of graph itself – for example, someone might use a point or bar graph for the development of oil prices over time and both of those are fine; but, applying a pie graph for something like this would be completely useless. The switching of values on the X and Y axes also belongs to this category. For a situation like budget development over a number of years, this isn’t a probable error, but could happen fairly easily when measuring something like volt-ampere characteristics.

Example of a correct graph with missing data. Source: RedGate.

There are also many other questions here – is the graph’s margin of error displayed (each statistical measurement has a certain margin of error, because, for example, you can’t ask everyone a question, but only a segment of the population – this means that with something like voting preferences, you know that between 3 and 7 percent of voters will vote for a certain party, so you write 5% into the graph – every statistical measurement and processing of data is burdened with a margin of error.) Typically – if we say that a certain party with preferences of 4.5% is under the limit for entering into the parliament, we will harm the party electorally and politically. But, a result of 4.5% with a margin of error typically around 1 to 3 percentage points says nothing of the sort. For public opinion polls about political parties, trends are important, not the exact values of voter preferences.

Another problem may of course be the interpretation of data, i.e. that correlation is not causality (the consumption of ice cream grows simultaneously with the number of drowning victims; people, however, are not drowning because of ice cream, but because it’s summertime and they’re swimming more often).

Finally – a graph always has a source and some data that it depicts. Perhaps the easiest way to manipulate a graph is to simply make one up. This is also something about graphs that we should want to hear and see in the media.