CIA Factbook

The purpose of this project was to explore the CIA Factbook, a database of information about all the countries in the world, using SQL queries in order to find out interesting information.

The project provided the opportunity to develop my knowledge of SQL and learn Jupyter Notebook. The version of SQL used is sqlite. It makes use of the following types of query:

  • Summary statistics, such as MIN, MAX, COUNT and AVG
  • Identifying distinct or repeated records
  • Sub-queries
  • Floating point division and Casting
  • Ordering, Limiting, Multiple conditions

The original Jupyter notebook can be downloaded from [here]http://pravjey.github.io/CIAFactbook/Factbook.ipynb]

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db
In [3]:
%sql SELECT * FROM sqlite_master WHERE type='table';
 * sqlite:///factbook.db
Done.
Out[3]:
type name tbl_name rootpage sql
table sqlite_sequence sqlite_sequence 3 CREATE TABLE sqlite_sequence(name,seq)
table facts facts 47 CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float)

First five rows of CIA Factbook

In [6]:
%sql SELECT * FROM facts LIMIT 5;
 * sqlite:///factbook.db
Done.
Out[6]:
id code name area area_land area_water population population_growth birth_rate death_rate migration_rate
1 af Afghanistan 652230 652230 0 32564342 2.32 38.57 13.89 1.51
2 al Albania 28748 27398 1350 3029278 0.3 12.92 6.58 3.3
3 ag Algeria 2381741 2381741 0 39542166 1.84 23.67 4.31 0.92
4 an Andorra 468 468 0 85580 0.12 8.13 6.96 0.0
5 ao Angola 1246700 1246700 0 19625353 2.78 38.78 11.49 0.46

The number of records

In [5]:
%%sql 
SELECT COUNT(*) AS "Number of records",
COUNT(DISTINCT(name)) AS "Number of distinct countries"
FROM facts;
Done.
Out[5]:
Number of records Number of distinct countries
261 261

Each country has its own, unique record

Summary statistics

The first task was to find the range for the population and population growth

In [6]:
%%sql
SELECT MIN(population) AS "Smallest population",
        MAX(population) AS "Largest population",
        MIN(population_growth) AS "Smallest population growth",
        MAX(population_growth) AS "Largest population growth"
        FROM facts;
Done.
Out[6]:
Smallest population Largest population Smallest population growth Largest population growth
0 7256490011 0.0 4.02

Out of the whole database, the lowest population of a country was zero and the highest was over 7 billion. This gave rise to two questions:

  • How could a country have a population of zero?
  • How could a country have a population that was equivalent to the current population of the whole world?

Upon further examination, it was found that the country with a population of zero is Antarctica. As Antarctica was a country where no-one lived, other than penguins, and the only human inhabitants were temporarily-based research scientists, it made sense to exclude Antartica from further analysis.

So the country with the lowest population is the Pitcairn Islands.

In [7]:
%%sql
SELECT name from facts 
WHERE population = (SELECT MIN(population) FROM facts);
Done.
Out[7]:
name
Antarctica
In [8]:
%%sql
SELECT name from facts 
WHERE population = (SELECT MIN(population) FROM facts
                    WHERE name <> "Antarctica");
Done.
Out[8]:
name
Pitcairn Islands

Similarly, the "country" with the largest population in the CIA Factbook is "the World". Again, it did not make sense to include this record in the analysis of countries. So the country with the largest population is China.

In [9]:
%%sql
SELECT name from facts 
WHERE population = (SELECT MAX(population) FROM facts);
Done.
Out[9]:
name
World
In [10]:
%%sql
SELECT name from facts 
WHERE population = (SELECT MAX(population) FROM facts
                    WHERE name <> "World");
Done.
Out[10]:
name
China

The countries with the lowest growth in population, i.e. zero growth, were Vatican City State, Cocos Islands, Greenland and the Pitcairn Islands.

The country with the highest growth in population was South Sudan.

In [11]:
%%sql
SELECT name from facts 
WHERE population_growth = (SELECT MIN(population_growth) FROM facts);
Done.
Out[11]:
name
Holy See (Vatican City)
Cocos (Keeling) Islands
Greenland
Pitcairn Islands
In [12]:
%%sql
SELECT name from facts 
WHERE population_growth = (SELECT MAX(population_growth) FROM facts);
Done.
Out[12]:
name
South Sudan

So, excluding the records for Antartica and the World, the range for population and population growth are as follows:

In [14]:
%%sql
SELECT MIN(population) AS "Smallest population",
        MAX(population) AS "Largest population",
        MIN(population_growth) AS "Smallest population growth",
        MAX(population_growth) AS "Largest population growth"
        FROM facts
        WHERE name <> "World" AND name <> "Antarctica";
Done.
Out[14]:
Smallest population Largest population Smallest population growth Largest population growth
48 1367485388 0.0 4.02

Average population and area

In [15]:
%sql SELECT AVG(population) FROM facts
Done.
Out[15]:
AVG(population)
62094928.32231405
In [16]:
%sql SELECT AVG(area) FROM facts
Done.
Out[16]:
AVG(area)
555093.546184739

Top ten countries with an above-average area and population, in descending order

In [14]:
%%sql
SELECT name, population, area
FROM facts
WHERE population > (SELECT AVG(population) FROM facts) AND
      area > (SELECT AVG(area) FROM facts)
ORDER BY population DESC;
 * sqlite:///factbook.db
Done.
Out[14]:
name population area
China 1367485388 9596960
India 1251695584 3287263
European Union 513949445 4324782
United States 321368864 9826675
Indonesia 255993674 1904569
Brazil 204259812 8515770
Pakistan 199085847 796095
Nigeria 181562056 923768
Russia 142423773 17098242
Mexico 121736809 1964375
Ethiopia 99465819 1104300
Egypt 88487396 1001450
Iran 81824270 1648195
Turkey 79414269 783562
Congo, Democratic Republic of the 79375136 2344858
France 66553766 643801

It is worth pointing out that the the European Union is included in the list of countries with an above-average area and population. Since the E.U. is not technically a country, but a group of countries, this could also be excluded from further analysis. So the top-ten countries with above-average population and area are:

In [12]:
%%sql
SELECT name, population, area
FROM facts
WHERE population > (SELECT AVG(population) FROM facts) AND
      area > (SELECT AVG(area) FROM facts) AND
      name <> "European Union"
ORDER BY population DESC;
 * sqlite:///factbook.db
Done.
Out[12]:
name population area
China 1367485388 9596960
India 1251695584 3287263
United States 321368864 9826675
Indonesia 255993674 1904569
Brazil 204259812 8515770
Pakistan 199085847 796095
Nigeria 181562056 923768
Russia 142423773 17098242
Mexico 121736809 1964375
Ethiopia 99465819 1104300
Egypt 88487396 1001450
Iran 81824270 1648195
Turkey 79414269 783562
Congo, Democratic Republic of the 79375136 2344858
France 66553766 643801

Water to land ratio

Top ten countries by water to land ratio

In [23]:
%%sql
SELECT name, (CAST(area_water AS FLOAT) / CAST(area_land AS FLOAT)) 
        AS "Water to land ratio"
FROM facts
ORDER BY (CAST(area_water AS FLOAT) / CAST(area_land AS FLOAT)) DESC
LIMIT 10;
 * sqlite:///factbook.db
Done.
Out[23]:
name Water to land ratio
British Indian Ocean Territory 905.6666666666666
Virgin Islands 4.520231213872832
Puerto Rico 0.5547914317925592
Bahamas, The 0.3866133866133866
Guinea-Bissau 0.2846728307254623
Malawi 0.25939625850340137
Netherlands 0.22571032366565366
Uganda 0.22292237442922375
Eritrea 0.16435643564356436
Liberia 0.15623961794019933

It can be seen that only two countries, British Indian Ocean Territory (aka Chagos Islands) and the Virgin Islands, comprise more water than land. This can be seen from the query below, comparing the actual water area to the actual land area.

All the other countries in the Factbook consist of more land than water.

In [28]:
%%sql
SELECT name
FROM facts
WHERE area_water > area_land;
Done.
Out[28]:
name
British Indian Ocean Territory
Virgin Islands
In [ ]:
Eighty-nine countries have a water to land ratio of zero, i.e. none of the area of the country comprises of water features.
In [29]:
%%sql
SELECT COUNT(name)
FROM facts
WHERE (CAST(area_water AS FLOAT) / CAST(area_land AS FLOAT)) = 0.0;
 * sqlite:///factbook.db
Done.
Out[29]:
COUNT(name)
89

Population change

A country's population increases in accordance with the birth and migration rate and decreases in accordance with the death rate. It is assumed that the figures for birth rate, death rate and migration rate in the database are percentages. So the query to find out the countries expected to see the largest population increase for the coming are is:

In [36]:
%%sql
SELECT name, ((birth_rate - death_rate + migration_rate) / 100) * population
        AS "Population increase"
FROM facts
ORDER BY ((birth_rate - death_rate + migration_rate) / 100) * population DESC
LIMIT 10;
Done.
Out[36]:
name Population increase
India 153583048.1568
China 73844210.952
Nigeria 45317889.177600004
Pakistan 35098834.82609999
Indonesia 29464871.877399996
Ethiopia 29143484.967000004
Bangladesh 27016343.4255
United States 26352246.847999997
Philippines 20452171.14
Congo, Democratic Republic of the 19907284.1088

I was particularly interested to find out which countries are likely to see decreases in population.

In the first instance, these would be countries where the death rate is bigger than the birth rate. There are 24 such countries.

In [45]:
%%sql
SELECT COUNT(name)
FROM facts
WHERE death_rate > birth_rate;
Done.
Out[45]:
COUNT(name)
24
In [33]:
%%sql
SELECT name
FROM facts
WHERE death_rate > birth_rate;
 * sqlite:///factbook.db
Done.
Out[33]:
name
Austria
Belarus
Bosnia and Herzegovina
Bulgaria
Croatia
Czech Republic
Estonia
Germany
Greece
Hungary
Italy
Japan
Latvia
Lithuania
Moldova
Monaco
Poland
Portugal
Romania
Russia
Serbia
Slovenia
Ukraine
Saint Pierre and Miquelon

The problem with simply comparing death rate and birth rate alone is that birth rate is not the only factor that contributes to population increase. Migration rate also has a role to play. So I then amended the above query to search for the countries where the death rate was higher than the birth and migration rates combined.

There were 13 such countries.

Apart from Japan, they were all in Central and Eastern Europe.

In [47]:
%%sql
SELECT COUNT(name)
FROM facts
WHERE death_rate > (birth_rate + migration_rate)
Done.
Out[47]:
COUNT(name)
13
In [50]:
%%sql
SELECT name
FROM facts
WHERE death_rate > (birth_rate + migration_rate);
Done.
Out[50]:
name
Belarus
Bosnia and Herzegovina
Bulgaria
Croatia
Germany
Greece
Hungary
Japan
Romania
Russia
Serbia
Slovenia
Ukraine

Finally, it would be interesting to see in how many countries migration rate has a bigger impact on population increase than birth rate

In [34]:
%%sql
SELECT name
FROM facts
WHERE birth_rate < migration_rate;
 * sqlite:///factbook.db
Done.
Out[34]:
name
Luxembourg
Micronesia, Federated States of
Qatar
Singapore
Saint Pierre and Miquelon
British Virgin Islands
Cayman Islands

There are only seven countries in the world where migration rate has a bigger impact on population increase than the birth rate. In all other countries, birth rate has a bigger impact than migration rate.

Countries with largest population densities

China, India, Russia and the United States may be some of the countries with the largest populations but they also take up a lot of geographic area. So, the population may be further apart. The interesting question is in which countries are there high populations crammed into small geographic areas.

In [39]:
%%sql
SELECT name, (CAST(population AS FLOAT) / CAST(area AS FLOAT)) AS "Population to Area ratio"
FROM facts
ORDER BY (CAST(population AS FLOAT) / CAST(area AS FLOAT)) DESC
LIMIT 10;
Done.
Out[39]:
name Population to Area ratio
Macau 21168.964285714286
Monaco 15267.5
Singapore 8141.279770444763
Hong Kong 6445.041516245487
Gaza Strip 5191.819444444444
Gibraltar 4876.333333333333
Bahrain 1771.8592105263158
Maldives 1319.6409395973155
Malta 1310.01582278481
Bermuda 1299.9259259259259

It is interesting to note that the countries with largest population densities are all tiny in a geographic sense.

The average population to area ration is:

In [7]:
%%sql
SELECT AVG(CAST(population AS FLOAT) / CAST(area AS FLOAT)) AS "Average Population to Area ratio"
FROM facts
 * sqlite:///factbook.db
Done.
Out[7]:
Average Population to Area ratio
419.66252469247945

It is interesting that India, despite having one of the largest populations and area in the world, actually has a below-average population to area ratio.

Most of the countries with below-average population to area ratios appear to be have small countries and islands or island archipelagos.

It is interesting to note that the United Kingdom, and the European Union as a whole - despite grumblings about influx of migrants - actually have below-average population to area ratios.

In [15]:
%%sql
SELECT name, (CAST(population AS FLOAT) / CAST(area AS FLOAT)) AS "Population to Area ratio"
FROM facts
WHERE (CAST(population AS FLOAT) / CAST(area AS FLOAT)) < (SELECT AVG(CAST(population AS FLOAT) / CAST(area AS FLOAT))
                                                            FROM facts)
ORDER BY (CAST(population AS FLOAT) / CAST(area AS FLOAT)) DESC;
 * sqlite:///factbook.db
Done.
Out[15]:
name Population to Area ratio
Tuvalu 418.03846153846155
Netherlands 407.96052283176465
Marshall Islands 398.8453038674033
Israel 387.5452094366875
Burundi 385.99626302551206
India 380.7713541630226
Belgium 370.9372707023061
Haiti 364.325009009009
Comoros 349.42774049217
Philippines 336.6612533333333
Sri Lanka 336.12998018594726
Japan 335.8418136353413
Curacao 334.2477477477477
Grenada 321.7848837209302
Guam 297.39889705882354
El Salvador 291.8753861508483
Vietnam 284.8610700160019
American Samoa 273.08040201005025
Jamaica 268.4205258848148
Saint Lucia 266.10714285714283
Saint Vincent and the Grenadines 263.82262210796915
United Kingdom 263.0771396904889
Puerto Rico 260.92067290261764
Pakistan 250.07800199724906
Trinidad and Tobago 238.3703198127925
Liechtenstein 235.15
Germany 226.4689795026637
British Virgin Islands 221.5496688741722
Luxembourg 220.51508120649652
Dominican Republic 215.3021573864804
Nepal 214.3707747603291
Cayman Islands 212.46969696969697
Antigua and Barbuda 209.13122171945702
Korea, North 207.26414076888616
Italy 205.26687462666754
Seychelles 203.14285714285714
Sao Tome and Principe 201.25103734439833
Saint Kitts and Nevis 198.98850574712642
Switzerland 196.76405746541658
Nigeria 196.5450805830035
Qatar 189.43699292249266
Andorra 182.86324786324786
Anguilla 180.41758241758242
Gambia, The 174.13353982300885
Kosovo 171.8545972260494
Kuwait 156.50095409136827
Uganda 153.92487906471177
Isle of Man 153.0506993006993
Malawi 151.62129063839842
Micronesia, Federated States of 149.8803418803419
Tonga 142.57161981258366
China 142.4915168970174
Guatemala 137.01107549890256
Cabo Verde 135.3813538308951
Czech Republic 134.97206689743493
Indonesia 134.41029125224657
Togo 132.99846790525666
Thailand 132.47662340193327
Kiribati 130.34648581997533
Denmark 129.51926022184063
Cyprus 128.54794076316074
Akrotiri 127.64227642276423
Poland 123.3259958104802
Dhekelia 120.76923076923077
European Union 118.83823161491145
Portugal 117.5514062330329
Azerbaijan 112.94203233256351
Northern Mariana Islands 112.8103448275862
Tokelau 111.41666666666667
Slovakia 111.04368308351178
Ghana 110.37319364616216
Wallis and Futuna 109.95070422535211
Hungary 106.3931396998753
Albania 105.37352163628775
Moldova 104.77820448435793
France 103.37630106197412
Austria 103.31997949231558
Armenia 102.75970816662745
Turkey 101.35033220089795
Cuba 99.50778459318059
Dominica 98.01198402130493
Slovenia 97.8351501997731
Spain 95.26907810119319
Costa Rica 94.21025440313112
Benin 92.77625153167232
Serbia 92.63487105351473
Malaysia 92.50909664177634
Syria 92.1527918781726
Romania 90.88577169440121
Jordan 90.85943900964831
Ethiopia 90.0713746264602
Egypt 88.3592750511758
Cambodia 86.77192808020548
Iraq 84.5419388250969
Burma 83.24273919636761
Timor-Leste 82.76966518757564
Swaziland 82.67755125547109
Sierra Leone 81.9500696961249
Greece 81.66026053941815
Macedonia 81.51577023295609
Kenya 79.1314823206695
Croatia 78.89253277732622
Honduras 78.03258988312963
Bosnia and Herzegovina 75.53284372131179
Morocco 74.6225484268279
Brunei 74.5266261925412
Ukraine 73.61357136939773
Cote d'Ivoire 72.2417827781792
Senegal 71.0435741808237
Georgia 70.74929698708752
Samoa 69.8597668668315
Ireland 69.61855904828312
United Arab Emirates 69.13588516746411
Burkina Faso 69.04334792122539
French Polynesia 67.84329253659708
Tunisia 67.46057698184707
Uzbekistan 65.26585158694681
Bulgaria 64.81744063348334
Lesotho 64.16409158293527
Mexico 61.972285841552655
Norfolk Island 61.388888888888886
Tajikistan 56.84911866759195
Ecuador 55.96113710982822
Eritrea 55.507559523809526
Virgin Islands 54.227225130890055
Tanzania 53.88565607516098
Turks and Caicos Islands 53.037974683544306
Montserrat 51.38235294117647
Yemen 50.64192716225226
Cameroon 49.93104913343429
Afghanistan 49.92769728470018
Fiji 49.764091058334245
Iran 49.64477504178814
Panama 48.48878281622912
Guinea 47.91469024676946
Guinea-Bissau 47.78325259515571
Montenegro 46.84860990443093
Palau 46.328976034858385
Belarus 46.193106936416186
Nicaragua 45.31626140983355
Lithuania 44.172021439509955
South Africa 44.02920457062235
Cocos (Keeling) Islands 42.57142857142857
Cook Islands 41.686440677966104
Colombia 41.036366350282286
Madagascar 40.56391461584455
Liberia 37.6735536819043
Zimbabwe 36.41531949523617
Faroe Islands 36.034458004307254
Djibouti 35.703620689655175
Congo, Democratic Republic of the 33.85072187740153
United States 32.70372369087204
Venezuela 32.09852530014802
Mozambique 31.65342265255573
Latvia 30.759185000541887
Laos 29.187263513513514
Kyrgyzstan 28.331636250881466
Estonia 27.978685769877067
Equatorial Guinea 26.40700866279277
Saint Helena, Ascension, and Tristan da Cunha 25.308441558441558
Brazil 23.98606491250938
Peru 23.688624324627146
Bahamas, The 23.385951008645534
Saint Pierre and Miquelon 23.376033057851238
Chile 23.15594985861696
Vanuatu 22.336861104274345
Sweden 21.7670993459843
Solomon Islands 21.54170127353267
Zambia 20.01847683685482
Sudan 19.397885235650698
Bhutan 19.3238266395791
Uruguay 18.96486110717022
South Sudan 18.690622337346294
Paraguay 16.67667768074896
Somalia 16.649044862676956
Algeria 16.602210735760103
New Zealand 16.57910798998917
Finland 16.196962841384615
Norway 16.082942662491277
Angola 15.741840859870058
Argentina 15.620732988059272
Belize 15.125359226682923
New Caledonia 14.622611036339165
Papua New Guinea 14.416275602800104
Congo, Republic of the 13.903792397660819
Mali 13.671702446072866
Saudi Arabia 12.909915383148268
Christmas Island 11.333333333333334
Turkmenistan 10.717930751895103
Oman 10.620148626817448
Bolivia 9.831666486130745
Central African Republic 8.654377961552784
Russia 8.329731968935754
Kazakhstan 6.6634085654519435
Gabon 6.371110372216224
Niue 4.576923076923077
Botswana 3.752116961476974
Libya 3.644006956363595
Suriname 3.5382309852276888
Canada 3.515372666297434
Mauritania 3.4895721354419327
Guyana 3.4201303443752353
Iceland 3.222504854368932
Australia 2.9389442490976876
Namibia 2.6838875058838374
Western Sahara 2.146112781954887
Mongolia 1.9134821202519505
Pitcairn Islands 1.0212765957446808
Falkland Islands (Islas Malvinas) 0.2761028505709357
Svalbard 0.030171649609154645
Greenland 0.026653143042335344

Intriguing information

There is one country in the world where the recorded total area is not the same as the sum of the land area and water area.

In [40]:
%%sql
SELECT name
FROM facts
WHERE area <> (area_land + area_water);
Done.
Out[40]:
name
Saint Helena, Ascension, and Tristan da Cunha

Names of countries vary in length, from 4 to 45, with an average length of 10.

In [58]:
%%sql
SELECT MAX(LENGTH(name)) AS "Longest name", 
       MIN(LENGTH(name)) AS "Shortest name", 
       ROUND(AVG(LENGTH(name)),0) AS "Average length of name"
FROM facts;
Done.
Out[58]:
Longest name Shortest name Average length of name
45 4 10.0

The countries with the longest names are:

In [36]:
%%sql
SELECT name
FROM facts
WHERE LENGTH(name) = (SELECT MAX(LENGTH(name))
                      FROM facts);
 * sqlite:///factbook.db
Done.
Out[36]:
name
Saint Helena, Ascension, and Tristan da Cunha
United States Pacific Island Wildlife Refuges

The countries with the shortest names are:

In [62]:
%%sql
SELECT name
FROM facts
WHERE LENGTH(name) = (SELECT MIN(LENGTH(name))
                      FROM facts);
Done.
Out[62]:
name
Chad
Cuba
Fiji
Iran
Iraq
Laos
Mali
Oman
Peru
Togo
Niue
Guam

There are 84 countries with above-average length of names and 159 countries with below average length of names.

In [64]:
%%sql
SELECT COUNT(name)
FROM facts
WHERE LENGTH(name) > (SELECT ROUND(AVG(LENGTH(name)),0)
                      FROM facts);
Done.
Out[64]:
COUNT(name)
84
In [65]:
%%sql
SELECT COUNT(name)
FROM facts
WHERE LENGTH(name) < (SELECT ROUND(AVG(LENGTH(name)),0)
                      FROM facts);
Done.
Out[65]:
COUNT(name)
159

[Back to Portfolio Page][http://pravjey.github.io/archive.html]