App and EDA

2022-04-22

Shiny App and Exploratory Data Analysis

This week our exploratory data analysis continued through development of our Shiny app (https://sophia-bevans.shinyapps.io/pop_app/). This app allows the user to explore population data (urban population, total population, or percentage of population that is urban) compared to air quality for countries grouped by region or sub-region. Linear models for each of the selected groupings are displayed on the resulting graph along with data points for each population and air quality value. Future directions for this app include adding the ability to limit the range of population and air quality values being displayed, using year as a plot parameter (currently all years with available data are displayed together and time is not a predictor in the linear model), and adding other types of data (slum population, transportation, etc.) which can be explored in a similar way as the air quality data.

When all regions are selected and subregions are not distinguished, Asia and Africa have a positive relationship between both total population and urban population and air quality (higher air quality value = worse quality), while other regions display no apparent relationship. The percentage of urban population shows no relationship with air quality for these regions, and has a slightly negative correlation for the Americas, Europe, and Oceania. Within regions, there is most often no relationship or a positive relationship between both total population and urban population compared to air quality, but when looking at the percentage of total population which is urban this trend is not consistent. For example, countries in both Northern Africa and Sub-Saharan Africa show a positive relationship between air quality and total population or urban population, but when looking at percentage urban population in the country compared to air quality, Northern African countries show a strong negative relationship. With greater development of graphical parameters and integration of more data into the Shiny app, we will be able to gain a greater insight into these interesting findings, and will explore the differences between total population and percentage urban population of a country in predicting additional parameters alongside air quality.

Some other EDA that we are also looking at includes the relationship between the slum population of a country and variables such as accessibility to various basic needs such as water, sanitation, electricity that are all found in the water quality data set. We would expect that the greater the slum population, the less percentage of the population will have access to these basic needs as a greater slum population indicates more people who are living in poverty and hence, have a harder time gaining access to these needs. This brings attention to the importance of alleviating poverty and slums in different parts of the world and how countries can create policies or programs to do so. If there is a positive correlation, then countries can focus on how to break down its slum populations and areas which then may translate to better qualities of life in these regions. In addition to this, we can also factor in other variables such as disbursements, gini coefficient, and land consumption of different countries in order to get a better understanding of just how big of a role each variable plays or affects another. Through this model and exploratory data analysis, we are able to gain greater insights as to what factors play a role in qualities of life and from here, can better comprehend just how a country may limit or prevent their population from living in such conditions.

Previous Thesis