How to Easily Graph World Bank Indicators in Stata

World Bank xtline figure 5

This is a pretty direct tutorial.

Notice in the Stata do-file code where you may need to install a few things, okay? This is going to be pretty cool though. You can download World Bank indicators directly in Stata.

Erika Sanborne Statistics

Creating Life-Expectancy-Clean-Panel.dta using World Bank indicators in Stata directly!

/* REFERENCES

Azevedo, J.P. (2011) "wbopendata: Stata module to access World Bank databases," 
Statistical Software Components S457234, Boston College Department of Economics.
http://ideas.repec.org/c/boc/bocode/s457234.html

Jann, Ben. (2014). ADDPLOT: Stata module to add twoway plot objects to an 
existing twoway graph. Available from http://ideas.repec.org/c/boc/bocode/s457917.html.

Jann, Ben. 2017. "GRSTYLE: Stata module to customize the overall look of graphs,
"Statistical Software Components S458414, Boston College Department of Economics,
revised 19 Sep 2020.

World Health Organization. 2021. "Life Expectancy at Birth, Total (Years)." 
The World Bank: https://data.worldbank.org/indicator/SP.DYN.LE00.IN.

Welcome to my tutorial for using xtline to plot World Bank indicators in Stata directly

*/

clear all
*ssc install wbopendata   //install this if you do not have it already
//help wbopendata //read through this to know how to use the wbopendata program
wbopendata, language(en - English) indicator(SP.DYN.LE00.IN) long clear
rename sp_dyn_le00_in life_expectancy
label variable life_expectancy "WHO Indicator sp_dyn_le00_in"
sort countryname
drop adminregion adminregionname lendingtype lendingtypename //don't need these
order year life_expectancy, after(countryname)
* let's get ready to xtset this data so we can use it as a panel data set
* Step 1: Create a numeric "unique identifier" var for each countryCode 
egen id = group(countrycode), label
* Step 2: xtset (we don't have to reshape long since we imported as longitudinal)
xtset id year
* Step 3: Done! Or "profit!" as we said in my day.
codebook, compact
xtsum life_expectancy //we can now do stuff with this panel data set & have fun

generate lifeexpr = round(life_expectancy,0.01) //creating a rounded version
gen lifeexprlab = string(lifeexpr, "%4.2f") + " yrs" //I'm gonna use this graphing

label data "Life Expectancy at Birth Panel Data World Bank - Clean Panel Dataset"
saveold "Life-Expectancy-Clean-Panel.dta", replace

/* So I just saved a clean panel data set from the World Bank database using life expectancy data. You can download my Life-Expectancy-Clean-Panel.dta file if you have any trouble creating it on your own, but running the above code should allow you to create it on your system for yourself, and with the latest data. Be sure to read the docs to understand how to access whatever indicators you need. Next, I make a graph.
*/

First we’ll make an ugly, unformatted xtline line plot using our World Bank panel data

*Let's graph life expectancy over time for a few countries only, yeah?
//Figure 1: Basic linear plot using xtline for North American life expectancy
//btw Canada=36 USA=252 Mexico=155
xtline life_expectancy if inlist(id, 36, 252, 155), overlay  /// 
xtitle("") ytitle("Mean Life Expectancy") ///
title("Basic Linear Plot - Before Styling") ///
caption("{it:Note.} Source is World Bank data. 2019 is the most recent available year." ///
"World Bank indicators are available at: https://data.worldbank.org/indicator.") ///
note("{&hearts} {stSerif:Erika Sanborne made this graph entirely in} {stMono:Stata} {&bullet} {it:https://geterika.com}", ///
color(maroon) size(*0.8) span) ///
name(figure1, replace)
graph export figure1.svg, replace 
World Bank xtline figure 1

Gross. Let’s install grstyle and addplot and run through some grstyle settings.

*if you want to style the graph as I do in this tutorial, install grstyle and addplot



net install grstyle, replace from("https://raw.githubusercontent.com/benjann/grstyle/master/")
ssc install addplot, replace


//there are so many grstyle settings. Here are some I'm using in this demo.
grstyle clear //resets any previous grstyle in the file
set scheme s2color
grstyle init //initializes grstyle to get ready to run
grstyle set horizontal //sets y axis tick labels horizontal/readable yay!
grstyle set legend 10, inside //clock position
grstyle set graphsize 10in 14in //h x w
grstyle set symbolsize small
grstyle set size 36pt: heading
grstyle set size 24pt: subheading axis_title
grstyle color background white //goodbye default teal background!
grstyle color plotregion none //goodbye any default plotregion colors!
grstyle linestyle plotregion none
grstyle yesno draw_major_hgrid no
grstyle yesno draw_major_ygrid no
grstyle yesno draw_major_vgrid no
grstyle linewidth plineplot thick
grstyle anglestyle vertical_tick horizontal
grstyle symbolsize p small
grstyle gsize axis_title_gap small //adds space between ticks and axis titles
grstyle color major_grid black
grstyle linewidth major_grid vthin

This is the exact same code as figure1, except we’re using grstyle.

xtline life_expectancy if inlist(id, 36, 252, 155), overlay  ///
xtitle("") ytitle("Mean Life Expectancy") ///
title("Same Basic Linear Plot -" ///
"With Some GRSTYLE Settings", linegap(2.0) margin(medlarge) size(*1.1) span) ///
caption("{it:Note.} Source is World Bank data. 2019 is the most recent available year." ///
"World Bank indicators are available at: https://data.worldbank.org/indicator.") ///
note("{&hearts} {stSerif:Erika Sanborne made this graph entirely in} {stMono:Stata} {&bullet} {it:https://geterika.com}", ///
color(maroon) size(*0.8) span) ///
name(figure2, replace)
graph export figure2.svg, replace 

/* It's so much better, isn't it? You can do a lot with <grstyle> - read the docs.
I will style it a bit, and use the labels of that earlier variable I created 
for the purpose of labeling now
*/
World Bank xtline figure 1

Already improved. Let’s style our xtline plot for our World Bank panel data set next…

//generating a few variables that will be used in graphs
//some starting position options for labels
gen pos1 = 1
gen pos2 = 2
gen pos3 = 3
gen pos4 = 4
gen pos5 = 5
gen pos6 = 6
gen pos7 = 7
gen pos10 = 10
gen pos11 = 11
gen pos12 = 12

*Figure 3
xtline life_expectancy if inlist(id, 36, 252, 155), overlay legend(off)  ///
addplot  /// we are adding a scatter plot with no symbols and only labels for country
(scatter life_expectancy year if inlist(id, 36, 252, 155) & year == 2019, ///
msymbol(none) mlabv(pos3) mlabgap(2.5) mlabel(countryname) mlabcolor(black) mlabsize(medium)) ///
xtitle("") ytitle("Mean Life Expectancy") ///
xlabel(1960(5)2020, labsize(small)) ///
xscale(range(1960 2035)) /// this is to make room for the addplot scatter mlabels on the right
title("Here We've Fixed the X Axis Range and Added" ///
"Labels to the Line Plots so We can Ditch the Legend", linegap(2.0) margin(medlarge) size(*1.1) span) ///
caption("{it:Note.} Source is World Bank data. 2019 is the most recent available year." ///
"World Bank indicators are available at: https://data.worldbank.org/indicator." ///
, span size (*.9)) ///
note("{&hearts} {stSerif:Erika Sanborne made this graph entirely in} {stMono:Stata} {&bullet} {it:https://geterika.com}", ///
color(maroon) size(*0.8) span) ///
name(figure3, replace)
graph export figure3.svg, replace
World Bank xtline figure 3

Now we’ll use addplot again to add more labels.

/* Okay, cool. Let’s use addplot once more, again no markers only the 
marker label, and this time we’ll have it show the rounded means plus "years", 
which is the string variable we created earlier when panelizing.
*/


*Figure 4
xtline life_expectancy if inlist(id, 36, 252, 155), overlay legend(off) ///
plot1opts(lcolor(maroon)) ///
plot2opts(lcolor(green)) ///
plot3opts(lp(dash) lcolor(navy)) ///
addplot ( ///
(scatter life_expectancy year if inlist(id, 36, 252, 155) & year == 2019, ///
msymbol(none) mlabv(pos3) mlabgap(2.5) mlabel(countryname) mlabcolor(black) mlabsize(medium)) ///
(scatter lifeexpr year if inlist(id, 36, 252, 155) & year == 2019, ///
msymbol(none) mlabel(lifeexprlab) mlabsize(vsmall) mlabcolor(black) mlabv(pos11) mlabgap(0.3)) ///
(scatter lifeexpr year if inlist(id, 36, 252, 155) & year == 1963, ///
msymbol(none) mlabel(lifeexprlab) mlabsize(tiny) mlabcolor(black) mlabv(pos4) mlabgap(0)) ///
) /// feel free to adjust marker label size and position - it's a tight fit
xtitle("") ytitle("Mean Life Expectancy") ///
xlabel(1960(5)2020, labsize(small)) ///
xscale(range(1960 2035)) /// 
title("Now This is one Nice-looking Line Plot, Right?" ///
"Mean Life Expectancy by Year Using World Bank Indicators", linegap(2.0) margin(medlarge) size(*0.9) span) ///
caption("{it:Note.} Source is World Bank data. 2019 is the most recent available year." ///
"World Bank indicators are available at: https://data.worldbank.org/indicator." ///
, span size (*.9)) ///
note("{&hearts} {stSerif:Erika Sanborne made this graph entirely in} {stMono:Stata} {&bullet} {it:https://geterika.com}", ///
color(maroon) size(*0.8) span) ///
name(figure4, replace)
graph export figure4.svg, replace
World Bank xtline figure 4

Wanna make one more? Our goal of learning to graph World Bank indicators in Stata is met. Have fun!

*Figure 5
//plotting some other countries
//bhutan=33, China=41, India=110, Finland=76
xtline life_expectancy if inlist(id, 33, 41, 76, 110), overlay legend(off) ///
plot1opts(lp(dash) lcolor(maroon)) ///
plot2opts(lcolor(green)) ///
plot3opts(lcolor(pink)) ///
plot4opts(lcolor(navy)) ///
addplot ( ///
(scatter life_expectancy year if inlist(id, 33, 41, 76, 110) & year == 2019, ///
msymbol(none) mlabv(pos3) mlabgap(1.0) mlabel(countryname) mlabcolor(black) mlabsize(small)) ///
(scatter lifeexpr year if inlist(id, 33, 41, 76) & year == 2019, ///
msymbol(none) mlabel(lifeexprlab) mlabsize(vsmall) mlabcolor(black) mlabv(pos11) mlabgap(1.0)) ///
(scatter lifeexpr year if id == 110 & year == 2019, ///
msymbol(none) mlabel(lifeexprlab) mlabsize(vsmall) mlabcolor(black) mlabv(pos7) mlabgap(2.5)) ///
(scatter lifeexpr year if inlist(id, 33, 76, 110) & year == 1962, ///
msymbol(none) mlabel(lifeexprlab) mlabsize(vsmall) mlabcolor(black) mlabv(pos6) mlabgap(0.8)) ///
(scatter lifeexpr year if id == 41 & year == 1962, ///
msymbol(none) mlabel(lifeexprlab) mlabsize(vsmall) mlabcolor(black) mlabv(pos12) mlabgap(5.5)) ///
) /// feel free to adjust marker label size and position - it's a tight fit
xtitle("") ytitle("Mean Life Expectancy") ///
xlabel(1960(5)2020, labsize(small)) ///
xscale(range(1960 2035)) /// 
title("Mean Life Expectancy Over Time per World Bank Indicators", ///
linegap(2.0) margin(medlarge) size(*0.9) span) ///
caption("{it:Note.} Source is World Bank data. 2019 is the most recent available year." ///
"World Bank indicators are available at: https://data.worldbank.org/indicator." ///
, span size (*.9)) ///
name(figure5, replace)
graph export figure5.svg, replace
World Bank xtline figure 5

About Erika Sanborne

Erika Sanborne has been producing media since 2014, specializing in video explainers, portraiture, green screen videography, and other digital media productions generally making cool stuff. Her latest passion in graphics is data visualization.

Leave a comment if you'd like, but please use your real name. Cheers!