Data for Equity
One convenient aspect of our data from the standpoint of beneficence is that we gather data at the city-level, so identification of persons is not an issue, and therefore we do not jeopardize anyone’s privacy. Our main goal with these datasets is to deduce what aspects of a city make it a better place to live than others. We recognize the need to consider how publication could reinforce social inequities, and we will take into account those issues which specifically affect the most vulnerable, so as not to leave them behind for the sake of “livability.” For example, it is crucial to take into account factors such as the proportion of a city’s population which lives in slums or does not have access to water in order to ensure that we do not draw conclusions which would encourage the further marginalization of those who are least fortunate in the societies we examine. Moreover, it is our hope that we can draw conclusions which would help cities better foster the inclusion of marginalized people; in other words, we hope to identify trends in the data which would indicate ways in which cities could reduce social inequality. To do so, researchers using our data could foster dialogue with marginalized communities identified in order to see if they share our conclusions; and if not, why they do not.
From the standpoint of justice, our data was collected from national governments, United Nations agencies, civil society organizations, and academic institutions, so we can be reasonably confident that no undue burden was put upon the people of the various countries in order to collect data. Again, the data is anonymous and high level- we do not use any data of individual persons, only population statistics/percentages, so we can be sure that our publication will not cause harm to any individuals/groups, and only help others in identifying and reducing inequality.
Considering that our conclusion will be based on the data sets that we have chosen, some limitations to the analysis we will do would be a certain degree of inaccuracy that may occur as a result of the missing values that were removed in the process of cleaning our data sets or even the fact that most of our data is taken from the past, with the most recent being 2019. Since beyond that year there have been major changes that have occurred in the world, primarily COVID-19, the conclusions of our analyses may not be as accurate as we would like it to be. Some abuse or misuse of the data that we use include the overlooking of this fact, that despite the limitations we’ve identified, we still insist on an inference or insight that is seemingly flawless, which may in turn lead to false or inaccurate information.
Loading Data Sets
#load datasets
load("air_q_by_c.RData")
load("slum_population.RData")
load("population.RData")
#load("water_q_avg.RData")
#load("combined.RData")
Urban Populations and Population Living in Slums
cols.num <- c("1990", "1995", "2000", "2005", "2010", "2014")
slum_pop_2[cols.num] <- sapply(slum_pop_2[cols.num],as.numeric)
slum_and_total_pop <- inner_join(x = slum_pop_2, y = urban_pop, by = c("Country" = "Region, subregion, country or area")) %>%
select("Country", `2000.x`, `2005.x`, `2010.x`, `2000.y`, `2005.y`, `2010.y`) %>%
pivot_longer(cols = `2000.x` : `2010.y`, names_to = "Year", values_to = "Population") %>%
separate(col = Year, into = c("Year", "Group"), sep = "\\.") %>%
mutate(Group = case_when(Group == "x" ~ "Slum",
Group == "y" ~ "Urban"))
slum_and_total_pop %>%
ggplot() + facet_grid(~Year) +
geom_bar(aes(y = Country, x = Population, fill = Group), position = 'dodge', stat= 'identity')

slum_and_total_pop %>%
pivot_wider(names_from = Group, values_from = Population) %>%
mutate(Prop_slum = Slum / Urban) %>%
ggplot() +
geom_line(aes(x = as.numeric(Year), y = Prop_slum, color = Country)) +
scale_x_continuous(breaks = c(2000, 2005, 2010)) +
labs(x = "Year", y = "Proportion of Urban Population in Slums", title = "Proportion of Urban Population Living in Slums", subtitle = "2000, 2005, 2010")
These graphs compare the urban population to the population living in slums for 35 countries. The first figure demonstrates that the slum population is much smaller than the urban population, but is hard to interpret because the India bar is so tall that all the other bars are barely visible. The second graph compares the proportion of the urban population living in slums for the same 35 countries for the years 2000, 2005, 2010, and although countries can’t be easily identified in this figure it is clear that for most the % slum population has been steady. Future directions with this data include identifying countries where the slum population is increasing or decreasing and try to identify possible reasons for this change.
Combining Cleaned Data Sets
colnames(urban_pop)[1] <- "Country"
pop <- urban_pop[-2]
water_quality <- read_csv("water_quality.csv")
water_quality <- water_quality %>% select(3,4,5,17,18,26,27,29:35) %>% rename(Country = 1, Region = 2, Year = 3, improved_water = 4, drinking_access = 5, improved_san = 6, san_access = 7, durable_1 = 8, durable_2 = 9, durable_3 = 10, living_area = 11, phone_access = 12, cell_access = 13, electric_access = 14)
water_quality <- water_quality[-c(1, 2, 3, 4), ]
water_quality[,4:14] <- lapply(water_quality[,4:14], as.double)
water_quality_cols <- colnames(water_quality)[4:14]
water_quality_avg <- water_quality %>%
group_by(Country) %>%
top_n(1,Year)
water_quality_avg_clean <- na.omit(aggregate(cbind(improved_water, drinking_access, improved_san, san_access, durable_1, durable_2, durable_3, living_area, phone_access, cell_access, electric_access)~Country, water_quality_avg, mean, na.rm=TRUE, na.action=na.pass))
water_quality_avg_clean
## Country improved_water drinking_access improved_san
## 1 Afghanistan 96.45000 62.55000 71.35000
## 2 Albania 99.40000 97.50000 99.10000
## 3 Angola 95.90000 71.20000 97.30000
## 6 Azerbaijan 99.50000 81.60000 98.40000
## 9 Belize 95.55000 96.80000 97.90000
## 11 Bolivia 93.53333 27.36667 77.78889
## 12 Bosnia and Herzegovina 99.60000 99.60000 98.95000
## 18 Cameroon 92.80000 86.60000 88.40000
## 19 Central African Republic 96.40000 56.00000 95.70000
## 20 Chad 99.80000 91.20000 69.80000
## 22 Comoros 99.90000 85.90000 26.60000
## 23 Congo 98.10000 92.90000 79.65000
## 24 Congo DRC 99.20000 84.30000 64.10000
## 25 Costa Rica 99.73333 99.75000 97.35000
## 26 Cote d'Ivoire 99.70000 1.90000 90.00000
## 30 Ethiopia 99.70000 75.80000 81.60000
## 31 Gabon 98.40000 69.50000 77.10000
## 33 Georgia 100.00000 100.00000 99.90000
## 34 Ghana 99.65000 97.43333 94.30000
## 35 Guatemala 99.40000 82.27500 92.02500
## 37 Guinea Bissau 91.66667 78.46667 32.93333
## 39 Haiti 99.10000 92.40000 85.10000
## 41 India 98.47143 87.70000 86.40000
## 43 Iraq 51.75000 51.75000 97.10000
## 45 Kazakhstan 84.70000 99.40000 99.90000
## 46 Kenya 98.40000 97.80000 92.70000
## 48 Lao People's Democratic Republic 93.46667 93.30000 89.86667
## 49 Lesotho 99.00000 90.90000 95.30000
## 52 Malawi 99.40000 95.00000 57.90000
## 53 Maldives 99.30000 34.10000 99.90000
## 54 Mali 98.90000 93.10000 93.60000
## 59 Mozambique 100.00000 99.40000 94.50000
## 60 Myanmar 97.05000 87.70000 86.15000
## 61 Namibia 97.70000 85.10000 89.10000
## 63 Nicaragua 98.30000 70.00000 81.00000
## 64 Niger 97.30000 86.00000 84.40000
## 65 Nigeria 93.43333 90.05000 73.56667
## 67 Pakistan 84.53333 79.30000 93.73333
## 69 Papua New Guinea 98.80000 97.80000 86.20000
## 70 Paraguay 99.55000 72.05000 95.20000
## 71 Peru 95.01111 93.70000 87.27778
## 73 Rwanda 96.90000 88.30000 94.10000
## 76 Serbia 99.90000 99.20000 99.00000
## 77 Sierra Leone 93.70000 81.30000 86.30000
## 82 Tanzania 96.70000 95.85000 93.80000
## 84 Timor-Leste 99.00000 97.60000 89.50000
## 86 Trinidad and Tobago 96.51667 43.80000 97.28333
## 88 Turkmenistan 100.00000 100.00000 100.00000
## 89 Uganda 98.70000 97.30000 86.60000
## 95 Yemen 99.16667 52.00000 93.76667
## 96 Zambia 98.60000 90.60000 87.70000
## 97 Zimbabwe 98.55000 96.25000 98.20000
## san_access durable_1 durable_2 durable_3 living_area phone_access
## 1 58.75000 90.45000 58.40000 32.80000 54.65000 4.700000
## 2 98.10000 99.90000 99.70000 99.30000 93.70000 30.500000
## 3 71.80000 94.50000 96.20000 99.20000 67.50000 5.500000
## 6 85.10000 29.00000 97.50000 99.80000 79.60000 86.200000
## 9 94.60000 73.95000 83.10000 99.40000 87.70000 16.800000
## 11 55.10000 71.35556 87.53333 96.27778 68.23333 29.777778
## 12 98.90000 82.50000 98.20000 96.80000 97.90000 92.500000
## 18 63.80000 85.90000 86.40000 96.30000 77.96667 2.233333
## 19 48.50000 45.40000 70.10000 97.60000 98.30000 2.800000
## 20 32.20000 37.80000 73.40000 98.80000 58.50000 3.800000
## 22 22.40000 74.80000 86.10000 98.30000 82.70000 15.200000
## 23 31.00000 93.90000 90.45000 98.95000 83.20000 3.500000
## 24 20.70000 93.20000 92.60000 99.70000 51.00000 2.700000
## 25 94.81667 94.58333 92.40000 98.26667 94.63333 63.083333
## 26 60.40000 99.20000 95.20000 95.30000 100.00000 5.800000
## 30 29.00000 94.00000 35.40000 97.70000 66.90000 30.500000
## 31 47.00000 96.80000 100.00000 99.70000 75.60000 3.100000
## 33 96.50000 99.30000 97.80000 99.50000 87.00000 78.500000
## 34 24.00000 98.86667 96.80000 99.63333 69.23333 2.500000
## 35 79.60000 90.90000 89.67500 99.60000 73.25000 18.875000
## 37 19.40000 68.96667 17.26667 94.70000 76.30000 1.166667
## 39 49.20000 94.80000 93.50000 96.30000 63.30000 1.100000
## 41 69.57143 84.68571 90.61429 95.44286 68.67143 7.400000
## 43 75.00000 97.53333 87.73333 94.46667 56.76667 37.633333
## 45 98.00000 99.20000 94.30000 99.90000 96.10000 91.800000
## 46 55.80000 99.70000 67.70000 100.00000 77.20000 2.700000
## 48 84.76667 93.83333 87.63333 97.16667 78.23333 32.233333
## 49 45.10000 95.30000 95.60000 97.50000 78.90000 9.400000
## 52 31.40000 83.00000 86.50000 91.60000 82.80000 4.700000
## 53 97.50000 98.80000 98.40000 99.60000 61.30000 48.200000
## 54 58.20000 83.65000 94.35000 98.45000 84.10000 7.850000
## 59 85.00000 96.60000 98.00000 99.50000 86.30000 4.500000
## 60 68.30000 62.20000 55.75000 98.60000 69.65000 12.050000
## 61 67.70000 91.10000 86.70000 87.10000 81.40000 39.400000
## 63 75.10000 77.60000 81.80000 99.30000 59.50000 28.200000
## 64 33.90000 83.30000 56.90000 67.00000 61.10000 7.000000
## 65 38.93333 74.53333 91.48333 97.72500 68.66667 1.050000
## 67 86.96667 88.20000 86.70000 83.20000 44.23333 12.566667
## 69 71.00000 98.60000 89.30000 99.10000 52.70000 13.200000
## 70 90.15000 94.80000 96.85000 98.85000 89.70000 34.250000
## 71 76.67778 70.32222 61.72222 90.17778 82.06667 34.077778
## 73 40.60000 83.70000 82.20000 98.80000 89.60000 0.800000
## 76 97.20000 99.90000 99.10000 98.90000 97.70000 92.600000
## 77 32.60000 83.90000 95.50000 87.50000 53.20000 0.200000
## 82 50.20000 87.75000 85.50000 99.00000 85.60000 2.250000
## 84 77.20000 90.40000 69.60000 97.20000 77.30000 12.600000
## 86 93.71429 78.50000 91.04286 99.82500 95.11429 69.557143
## 88 99.20000 100.00000 100.00000 100.00000 93.80000 97.300000
## 89 31.60000 98.30000 94.60000 99.00000 55.20000 6.400000
## 95 91.96667 97.80000 98.50000 51.33333 52.86667 42.900000
## 96 31.90000 97.60000 98.80000 99.80000 66.80000 3.900000
## 97 57.55000 96.65000 97.85000 99.20000 81.00000 13.650000
## cell_access electric_access
## 1 96.20000 91.35000
## 2 99.10000 0.00000
## 3 94.90000 82.30000
## 6 77.50000 99.50000
## 9 96.70000 98.00000
## 11 77.96667 96.58889
## 12 70.50000 100.00000
## 18 95.56667 87.56667
## 19 91.70000 46.90000
## 20 94.90000 54.80000
## 22 93.70000 85.10000
## 23 97.20000 85.70000
## 24 95.90000 90.40000
## 25 94.03333 99.66000
## 26 97.40000 97.30000
## 30 97.50000 99.80000
## 31 98.30000 99.50000
## 33 76.00000 98.60000
## 34 97.70000 94.20000
## 35 93.22500 97.67500
## 37 97.46667 29.23333
## 39 92.90000 88.80000
## 41 97.37143 98.60000
## 43 78.13333 99.80000
## 45 99.00000 99.95000
## 46 99.60000 97.90000
## 48 96.56667 95.80000
## 49 95.90000 67.90000
## 52 86.90000 65.40000
## 53 99.60000 99.90000
## 54 98.40000 91.00000
## 59 97.80000 97.30000
## 60 96.05000 97.05000
## 61 87.70000 86.80000
## 63 21.50000 99.50000
## 64 89.30000 73.50000
## 65 95.76667 79.90000
## 67 97.36667 98.76667
## 69 94.60000 87.70000
## 70 98.65000 99.85000
## 71 90.84444 97.25556
## 73 93.90000 91.50000
## 76 97.70000 100.00000
## 77 96.80000 62.80000
## 82 97.00000 70.00000
## 84 98.30000 98.90000
## 86 88.00000 97.61429
## 88 100.00000 100.00000
## 89 98.70000 95.10000
## 95 95.96667 99.40000
## 96 92.70000 78.60000
## 97 97.30000 88.15000
slum_pop_2[,2:9] <- lapply(slum_pop_2[,2:9], as.double)
combined_data <- air_quality_by_country %>%
##inner_join(slum_pop_2, by="Country") %>%
inner_join(pop, by="Country") %>%
inner_join(water_quality_avg_clean, by="Country")
head(combined_data)
## Country 2015.x 2016 2017 2018 2019 2000 2005
## 1 Afghanistan 59.98571 56.56000 52.74857 52.13143 51.69429 4436 5692
## 2 Albania 19.37500 17.62500 18.80000 18.60000 18.40000 1303 1439
## 3 Azerbaijan 25.51667 23.88333 24.47500 24.41667 24.61667 4174 4473
## 4 Belize 21.25714 20.42857 20.15714 20.42857 20.34286 112 128
## 5 Bosnia and Herzegovina 33.80000 29.56667 30.80000 30.80000 30.33333 1596 1663
## 6 Cameroon 69.33636 65.17273 62.25455 63.88182 64.54545 6956 8456
## 2010 2015.y 2020 2025 2030 2035 2040 2045 2050 improved_water
## 1 6837 8368 9904 11705 13818 16279 19104 22228 25499 96.45
## 2 1534 1679 1827 1949 2038 2090 2106 2102 2083 99.40
## 3 4824 5262 5696 6101 6491 6883 7253 7572 7833 99.50
## 4 145 163 183 205 230 256 283 311 338 95.55
## 5 1696 1668 1715 1768 1824 1876 1919 1952 1976 99.60
## 6 10297 12463 14942 17740 20857 24291 28049 32106 36415 92.80
## drinking_access improved_san san_access durable_1 durable_2 durable_3
## 1 62.55 71.35 58.75 90.45 58.4 32.8
## 2 97.50 99.10 98.10 99.90 99.7 99.3
## 3 81.60 98.40 85.10 29.00 97.5 99.8
## 4 96.80 97.90 94.60 73.95 83.1 99.4
## 5 99.60 98.95 98.90 82.50 98.2 96.8
## 6 86.60 88.40 63.80 85.90 86.4 96.3
## living_area phone_access cell_access electric_access
## 1 54.65000 4.700000 96.20000 91.35000
## 2 93.70000 30.500000 99.10000 0.00000
## 3 79.60000 86.200000 77.50000 99.50000
## 4 87.70000 16.800000 96.70000 98.00000
## 5 97.90000 92.500000 70.50000 100.00000
## 6 77.96667 2.233333 95.56667 87.56667
Relationship Between Access to Sanitation and Electricity
combined_data %>%
ggplot(aes(x=electric_access, y=san_access)) + geom_point(aes(color = Country)) +
theme(legend.text = element_text(size=5), legend.spacing.y = unit(.001, 'cm'), legend.spacing.x = unit(.1, 'cm')) + labs(y="Access to Sanitation", x="Access to Electricity") +
geom_smooth(method="lm")
Here, we try to investigate the possible relationships between access to electricty and access to sanitation in different countries. It may seem at first glance that the two are somewhat unrelated, but from the plot, we can see that there exists some relationship between the two variables – countries with better access to electricity tend to have better access to sanitation as well. However, it is important to note that the relationship between the two variables in question do not necessarily have a causal relationship. With that being said, there is a slight positive linear correlation between access to sanitation and access to electricity.
Relationship Between Drinking Access and Improved Water Quality
combined_data %>%
ggplot(aes(x=drinking_access, y=improved_water)) + geom_point(aes(color=Country)) +
theme(legend.text = element_text(size=5), legend.spacing.y = unit(.001, 'cm'), legend.spacing.x = unit(.1, 'cm')) + geom_smooth(method="lm")
Looking at the relationship between drinking access of a country and improved water quality, we can see that the figure shows a potential correlation between the two variables. On the other hand, we can also see that the fitted line does not describe the data set very accurately as there are evidently outliers. With this said, there exists a slight positive relationship between drinking access and improved water quality, which is intuitively expected. Similarly to the previous graph, it is inaccurate to assume that the two variables have a causal relationship but may be more plausible to say that the two are correlated to a certain degree.