A tourist's city cyclability index

Aug 5, 2020 10 min read

Project

Whether you’ve managed to go on holiday during lockdown or just looking for a microadventure nearby, cycling is an excellent way to take advantage of the quieter roads and explore a city. Plus, it’s social-distancing friendly. Even when it’s quiet, road safety, weather and bike availability are still relevant matters to consider before embarking on a cycling adventure. Lockdown or not, I love seeing different cities by bike, but I also find myself concerned about road safety, weather, bike availability and whether or not it’s safe to park my bike. Having found this excellent cycling data from Coya, and inspired by their own index, I’ve decided to come up with my own city “cyclability” ranking system.

Using road safety, weather, bicycle rental facilities and road quality data, I ranked cities by means of a weighted average score with a strong emphasis on road safety. Specifically, the road safety component accounts for half of the score and is made up of two components - accident ratio and fatalities ratio.

You can lookup a city on the map below, which is illustrating the cities’ “cyclability” ranks:

And here’s the list ranked from the most “cyclable” cities to the least, if you want to see a the cities specific characteristics, such as the weather score or the number of bicycle shops:

rank	city	country	weather	road_quality	accidents_per_100k	fatalities_per_100k
1	Munster	Germany	58.74	53.05	445.58	0.53
2	Utrecht	Netherlands	63.83	60.78	279.88	1.94
3	Malmo	Sweden	58.98	52.40	433.11	0.37
4	Copenhagen	Denmark	61.19	61.60	218.44	1.04
5	Hangzhou	China	54.20	35.06	476.73	1.71
6	Antwerp	Belgium	62.67	53.37	1165.81	1.61
7	Amsterdam	Netherlands	63.42	61.71	1019.18	1.77
8	Melbourne	Australia	78.86	43.19	174.60	0.32
9	Strasbourg	France	61.26	58.82	267.57	0.63
10	Bern	Switzerland	53.27	65.46	482.36	0.69
11	Hamburg	Germany	58.26	54.90	263.80	0.34
12	Bremen	Germany	58.86	55.96	334.85	1.38
13	Leipzig	Germany	58.67	51.05	323.29	0.71
14	Montreal	Canada	49.60	44.12	986.47	0.42
15	Berlin	Germany	57.69	50.79	407.56	0.58
16	Innsbruck	Austria	55.14	56.18	1329.96	0.78
17	Bordeaux	France	67.14	59.40	305.14	0.72
18	Frankfurt	Germany	64.10	51.89	232.84	0.86
19	Dusseldorf	Germany	70.73	50.87	279.60	0.69
20	Dresden	Germany	57.38	54.27	403.51	0.39
21	Cologne	Germany	57.88	50.04	372.23	0.40
22	Nuremberg	Germany	60.73	52.95	310.97	0.83
23	Bristol	United Kingdom	69.85	56.17	1054.36	0.58
24	Beijing	China	58.74	32.11	384.95	1.14
25	Sydney	Australia	75.56	39.80	345.84	0.28
26	Tel Aviv	Israel	84.54	37.88	1067.21	0.48
27	Bonn	Germany	62.72	51.72	461.83	1.32
28	Munich	Germany	50.52	53.95	371.62	0.60
29	Helsinki	Finland	47.16	51.96	222.36	1.05
30	Tokyo	Japan	60.83	53.85	1220.14	1.44
31	Wellington	New Zealand	72.48	55.82	1188.31	0.52
32	Nice	France	83.56	55.04	389.41	0.92
33	Shanghai	China	59.01	32.75	309.92	1.13
34	Hannover	Germany	58.83	51.51	773.00	2.99
35	Auckland	New Zealand	74.64	38.83	453.72	0.25
36	Nantes	France	67.76	64.62	325.85	0.77
37	Oslo	Norway	47.45	38.58	226.40	0.53
38	Ljubljana	Slovenia	46.05	34.75	840.20	1.43
39	Dortmund	Germany	63.96	48.71	147.85	0.39
40	Vancouver	Canada	65.09	51.09	1455.97	0.44
41	Geneva	Switzerland	56.48	66.49	623.47	0.95
42	Seville	Spain	70.41	52.18	1142.89	0.89
43	Portland	United States	71.32	61.96	1692.47	0.96
44	Stuttgart	Germany	56.34	49.25	153.26	0.00
45	Madrid	Spain	68.34	46.60	1104.19	0.61
46	Vienna	Austria	69.29	53.44	1015.93	1.58
47	Milan	Italy	53.92	35.26	1542.70	1.16
48	Sao Paulo	Brazil	66.93	12.04	475.44	1.71
49	Washington	United States	60.97	48.32	1699.01	0.57
50	Paris	France	66.31	58.51	318.14	0.75
51	Casablanca	Morocco	89.45	21.30	995.83	4.13
52	San Francisco	United States	88.58	56.12	2586.71	0.56
53	Cairo	Egypt	77.58	18.35	1929.06	0.47
54	Dublin	Ireland	68.72	43.56	517.05	0.63
55	Seattle	United States	71.92	56.73	1688.20	0.67
56	Warsaw	Poland	49.50	29.50	345.40	2.39
57	New Delhi	India	52.24	23.38	441.42	1.08
58	Reykjavik	Iceland	46.55	61.99	593.25	0.50
59	Santiago	Chile	81.05	34.34	362.72	2.51
60	Barcelona	Spain	83.04	54.15	1796.71	0.50
61	Edinburgh	United Kingdom	62.56	52.85	1175.78	0.95
62	Bogota	Colombia	70.16	17.30	567.44	3.21
63	Cali	Colombia	74.26	13.07	622.11	3.69
64	Brussels	Belgium	63.34	40.75	1060.01	1.84
65	Moscow	Russia	45.49	13.84	693.27	1.26
66	Boston	United States	59.44	54.20	3459.77	0.49
67	Athens	Greece	82.73	32.47	571.98	0.75
68	London	United Kingdom	64.69	42.61	1299.08	0.68
69	Nairobi	Kenya	78.59	24.10	459.97	1.13
70	Chicago	United States	56.68	49.65	2197.60	0.62
71	Jakarta	Indonesia	31.92	21.24	428.31	0.97
72	Seoul	South Korea	57.64	43.62	341.99	2.37
73	Los Angeles	United States	89.82	50.65	2148.35	0.85
74	New York	United States	60.75	48.20	2157.77	0.66
75	Krakow	Poland	50.24	33.39	312.40	2.16
76	Buenos Aires	Argentina	72.86	16.94	835.41	0.83
77	Detroit	United States	57.38	55.79	858.76	0.91
78	Lisbon	Portugal	81.33	44.94	1947.91	1.76
79	Singapore	Singapore	58.32	56.79	339.25	2.35
80	Stockholm	Sweden	51.40	51.97	366.39	0.56
81	Rome	Italy	77.83	34.17	1003.01	1.41
82	Johannesburg	South Africa	85.48	57.11	349.85	7.43
83	Prague	Czech Republic	55.80	37.34	838.27	1.74
84	Medellin	Colombia	76.22	9.27	1098.53	3.99
85	Istanbul	Turkey	75.00	31.71	1837.11	0.41
86	Hong Kong	SAR China	50.13	34.89	1117.99	1.19
87	Bangkok	Thailand	23.62	24.75	554.42	3.29
88	Lagos	Nigeria	29.24	1.43	525.48	2.97
89	Mexico City	Mexico	80.54	23.45	411.56	0.65
90	Tbilisi	Georgia	64.45	14.85	2272.83	0.76

You can also download the file containing all the data used as a CSV:

If you’d like to explore the methodology, you can carry on reading below. Happy cycling! :)

Methodology

The data

The data gathered by Coya is made up of 20 variables.The full list of variables, data collection methodology and variable definitions can be accessed here. For the purpose of developing this ranking system, I made use of the following variables:

city, country: Self explanatory.
weather: Aggregated whether score ranging from 1 to 100. The higher the better the weather is for cycling.
fatalities_per_100k: Deaths per 100k cyclists.
accidents_per_100k: Accidents that resulted in at least light injuries per 100k cyclists.
bicycle_theft_score: Stolen bicycles per 100 k cyclists, weighed for high crime rates (to compensate for low reporting rates).
specialised_roads_and_road_qual_score: Cycling road length per number inhabitants.
number_bike_sharing_and_rental_stations_per_100k: “An estimate of bicycle sharing and rental stations per 100,000 of population.”
percent_bicycle_usage: “Percentage of people using bicycles in everyday life in each city.”

Ranking System

(Road) Safety First

There are two ways of looking at road safety - how likely one is to be involved in an accident, and how likely one is to be involved in a fatal accident. The difference is somewhat subtle. Having congested (cycling) roads might increase the chances of an accident resulting in light to minor injuries, for example. Cycling in a city where motorists are not used to cyclists, on the other hand, might not necessarily increase the chance of being involved in an accident but, a collision with an inattentive car driver, for example, would increase the chances of being involved in a fatal accident.

This negative relationship is shown in the plot below, which illustrates that as the number of accidents per 100k cyclists increases, the number of fatalities decreases. The trend line also suggests that the safest cities tend to have an accident ratio of about 1500.

So I’d consider the fatalities ratio rather than the accidents ratio, when assessing a cycling destination’s road safety.

Another consideration that needs to be made with regards to safety is that the bicycle-related fatality rate is related to the bicycle usage in everyday life. In the Netherlands, for example, the fatality rate is high amongst the elderly. While this in itself is problematic in that everyone should be safe to cycle, the high fatality rate also highlights the fact that the elderly do feel safe to cycle.

I also happen to know that the road accident and fatality rates in Johannesburg are high overall. This is reflected in the high bicycle-related fatality rate in Johannesburg, but also, perhaps, by the low percentage of the population who uses bikes on a daily basis.

In other words, in cities where it is relatively safe to cycle, more people - young and old, skilled or less skilled - feel safe enough to cycle through the cities. The wide variety of people, in turn, can lead to more fatalities. Under these circumstances, however, the increase in fatalities reflects the fact that population at wide feels safe enough to cycle rather than that the roads are really dangerous for cyclists.

I illustrate this in the plot below, where the percentage of people cycling in everyday life is plotted against the bicycle-related fatality rate. The plot illustrates that, in the cities where everyday cycling is uncommon, the fatality rate tends to be high. As everyday life cycling become more popular, the fatality rate tends to decrease. However, as even more people start using the bikes, the fatality rate increases slightly.

Note that a similar relationship seems to be emerging when contrasting the bicycle usage with the accident rate.

Now, I am not claiming that this “U-shaped” characteristic is necessarily due to vulnerable populations (such as the elderly) taking to the roads. It is also plausible, for example, that as more people take to the roads, the roads become more congested leading to more accidents. Even if that were the case, since the scoring system already (heavily) penalises high-fatality-rate cities, the percentage of people using bicycles in everyday life should be incorporated into the ranking system. Additionally, by penalising the cities where cycling is not common in everyday life, the scoring system implicitly penalises the lack of evidence. In other words, I don’t trust trust the numbers if a small percentage of the population cycles.

Before generating a safety score, I min-max normalise the fatalities and accidents data so that they take a value between 0 and 1:

# min-max rescale fatality and accident rate
bikes <- bikes %>%
            mutate(fatalities_per_100k_norm = rescale(fatalities_per_100k),
                   accidents_per_100k_norm = rescale(accidents_per_100k))

To take both safety variables into consideration, I’ve created a weighted safety score whereby the normalized fatality rate counts three times as much as the normalized accident rate, as follows:

\[ safety\_score = \begin{align*} &log(percent\_bicycle\_usage)* \\ \\ &\left[1-\frac{(3*fatalities\_per\_100k\_norm + {accidents\_per\_100k\_norm})}4\right]\end{align*} \]

This way, cities with high accident rates are also penalised, though three times less than cities which have a high fatality rate. To penalise cities where few people use bicycles on an everyday basis, note that I multiplied this weighted average with the log of percent_bicycle_usage.

# calculate vulnerability score
bikes <- bikes %>% 
    
    mutate(vulnerability_score = (3*fatalities_per_100k_norm + # weighted average - fatalities 3x more important
                                    accidents_per_100k_norm 
                                    )/4, 
           
           safety_score = log(percent_bicycle_usage) * # penalty for low everyday bicycle usage
                            (1 - vulnerability_score) # inverse of vulnerability score - aka safety

           )

As the plot above illustrates, the safety score is very low for cities with a high fatality rate, and moderate for cities with high accident rates but low fatality rates. Note that, in spite of low fatality and accident rates, for cities where few people cycle in their everyday life (such as Tbilisi) the safety score is low.

Weather, bicycle theft, bike-sharing and road facilities.

Provided it’s relatively safe to cycle, I’d prefer riding in good whether, find bicycles easily and not worry about them being stolen. I also find it easier when there are dedicated cycle lanes and proper signs - especially in a new city! So I’ve min-max scaled and averaged the weather, number_bike_sharing_and_rental_stations_per_100k, the inverse of bicycle_theft_score, and the specialised_roads_and_road_qual_score to compose an other_factors score.

\[ other\_factors = \left[ \begin{align*} &weather\_norm + \\ &number\_bike\_sharing\_and\_rental\_stations\_per\_100k + \\ &(1 - bicycle\_theft\_score\_norm) + \\ &road\_quality\_norm\end{align*}\right]/4 \]

Then, I’ve averaged the safety_score with the other_factors score into a final_score, as follows:

\[ final\_score = (safety\_score + other\_factors)/2 \]

Code below:

# calculate other factors and generate a final score variable
bikes <- bikes %>% 
            mutate(other_factors = (rescale(weather) +
                                    rescale(number_bike_sharing_and_rental_stations_per_100k) +
                                    (1-rescale(bicycle_theft_score)) +
                                    rescale(specialised_roads_and_road_qual_score)
                                    )/4,
                   final_score = (safety_score + other_factors)/2
                   ) %>%
            arrange(desc(final_score)) %>%
            mutate(rank = 1:nrow(bikes))

There we have it. A city “cyclability” score which prioritises road safety and takes weather and bicycle rental/sharing facilities into consideration. Fun and safe cycling, here we come! :)

Bicycle tourism Data-driven decision-making

Dragos Tomescu

Data Trainer (previously, Data Analyst)

A data analytics professional with a passion for understanding society. I write about data-driven applications and their impact on business and society.