Lima Como Vamos Survey Analysis

Author

Fernando Calderón-Figueroa

Published

January 10, 2024

Loading packages and data

Code

library(tidyverse)
library(lme4)
library(modelsummary)
library(sjPlot)
library(corrplot)
library(DescTools)
library(performance)
library(fastDummies)

lcv1019_ctx = read_csv('data/processed/lcv1019_ctx.csv') |>
  mutate(ses_factor = factor(ses_factor, levels=c(1,2,3,4,5),
                             labels = c('A','B','C','D','E')),
         year_factor = factor(year))
str(lcv1019_ctx)

tibble [19,200 × 60] (S3: tbl_df/tbl/data.frame)
 $ UBIGEO                  : num [1:19200] 150143 150143 150143 150143 150143 ...
 $ year                    : num [1:19200] 2019 2019 2019 2019 2019 ...
 $ id                      : num [1:19200] 19200 19199 19198 19197 19196 ...
 $ petition                : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ mass_media_complaining  : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ neighbourhood_org       : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ elected_position        : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ volunteering            : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ participatory_budget    : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ membership              : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ demonstrating           : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ online_groups           : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ social_media_complaining: num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ community_project       : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ female                  : num [1:19200] 1 1 1 0 1 0 0 0 1 1 ...
 $ age                     : num [1:19200] 18 23 33 47 41 50 81 67 33 42 ...
 $ district                : num [1:19200] 150143 150143 150143 150143 150143 ...
 $ borough                 : chr [1:19200] "Lima Sur" "Lima Sur" "Lima Sur" "Lima Sur" ...
 $ ses_factor              : Factor w/ 5 levels "A","B","C","D",..: 2 4 4 3 4 2 3 4 2 3 ...
 $ ses_rev                 : num [1:19200] 4 2 2 3 2 4 3 2 4 3 ...
 $ lowered                 : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ old_timer               : num [1:19200] 1 1 1 1 0 1 1 1 1 1 ...
 $ safety_lima             : num [1:19200] 2 0 0 1 0 3 2 0 0 0 ...
 $ workstudy_outside       : num [1:19200] 1 0 0 1 1 0 0 1 1 0 ...
 $ works_home              : num [1:19200] 0 1 1 0 0 0 0 0 0 1 ...
 $ works_not               : num [1:19200] 0 0 0 0 0 1 1 0 0 0 ...
 $ walk_bike               : num [1:19200] 0 0 0 0 1 0 0 0 0 0 ...
 $ transit                 : num [1:19200] 1 0 0 1 0 0 0 1 1 0 ...
 $ drive_or_taxi           : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ commute_min             : num [1:19200] 30 NA NA 90 10 NA NA 90 65 NA ...
 $ safety_hood             : num [1:19200] 3 0 2 0 3 3 2 1 0 0 ...
 $ victimization           : num [1:19200] 0 0 0 0 1 0 0 1 1 1 ...
 $ fences_supporter        : num [1:19200] NA NA NA NA NA NA NA NA NA NA ...
 $ children_under15        : num [1:19200] NA NA NA NA NA NA NA NA NA NA ...
 $ trust_neighbours        : num [1:19200] NA NA NA NA NA NA NA NA NA NA ...
 $ civiceng_core           : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ civiceng_allvars        : num [1:19200] 0 0 0 0 0 0 0 0 0 0 ...
 $ DISTRITO                : chr [1:19200] "VILLA MARIA DEL TRIUNFO" "VILLA MARIA DEL TRIUNFO" "VILLA MARIA DEL TRIUNFO" "VILLA MARIA DEL TRIUNFO" ...
 $ HOGNSEA17               : num [1:19200] 37 37 37 37 37 37 37 37 37 37 ...
 $ HOGNSEB17               : num [1:19200] 7370 7370 7370 7370 7370 7370 7370 7370 7370 7370 ...
 $ HOGNSEC17               : num [1:19200] 35222 35222 35222 35222 35222 ...
 $ HOGNSED17               : num [1:19200] 42553 42553 42553 42553 42553 ...
 $ HOGNSEE17               : num [1:19200] 15228 15228 15228 15228 15228 ...
 $ NSE_PREDOM              : chr [1:19200] "D" "D" "D" "D" ...
 $ ls                      : num [1:19200] 0.159 0.159 0.159 0.159 0.159 ...
 $ p                       : num [1:19200] 0.0412 0.0412 0.0412 0.0412 0.0412 ...
 $ ls_cat                  : chr [1:19200] "medium" "medium" "medium" "medium" ...
 $ circuity_avg_mean       : num [1:19200] 1.07 1.07 1.07 1.07 1.07 ...
 $ intersection_mean       : num [1:19200] 206 206 206 206 206 ...
 $ barriers_00_mean        : num [1:19200] 5.09 5.09 5.09 5.09 5.09 ...
 $ barriers_01_mean        : num [1:19200] 12.7 12.7 12.7 12.7 12.7 ...
 $ barriers_03_mean        : num [1:19200] 13.2 13.2 13.2 13.2 13.2 ...
 $ barriers_05_mean        : num [1:19200] 15.8 15.8 15.8 15.8 15.8 ...
 $ barriers_10_mean        : num [1:19200] 39.1 39.1 39.1 39.1 39.1 ...
 $ dist_node_mean          : num [1:19200] 38.9 38.9 38.9 38.9 38.9 ...
 $ pre1975_prop            : num [1:19200] 0.895 0.895 0.895 0.895 0.895 ...
 $ pre1990_prop            : num [1:19200] 0.957 0.957 0.957 0.957 0.957 ...
 $ change19752020_prop     : num [1:19200] 0.102 0.102 0.102 0.102 0.102 ...
 $ change19902020_prop     : num [1:19200] 0.0396 0.0396 0.0396 0.0396 0.0396 ...
 $ year_factor             : Factor w/ 10 levels "2010","2011",..: 10 10 10 10 10 10 10 10 10 10 ...

Exploring dependent variables

The data includes 19200 observations for ten years of the Lima Cómo Vamos survey. There are 59 variables besides the ID, 23 of them are contextual. The types of analyses are restricted due to data availability for specific years. For instance, my main outcomes of interest are civic engagement and trust in neighbours. The survey includes the former in every year, but the latter only for the 2010-2014 period.

I measure civic engagement through a series of questions about participation in collective action, from signing petitions to attending participatory budget meetings. I use a summary indicator of having partaking in any of them as civiceng_core (dummy) for those items present in all years. The proportion of respondents that have partaken is 0.19 (sd = 0.39), but this varies a lot by district and year (see Figure 1 and Figure 2).

Code

lcv1019_ctx |>
  mutate(DISTRITO = str_to_title(DISTRITO)) |>
  group_by(UBIGEO, DISTRITO) |>
  summarize(ses_predom = unique(NSE_PREDOM, na.rm = TRUE),
            mean_ce = mean(civiceng_core, na.rm = TRUE),
            sd_ce = sd(civiceng_core, na.rm = TRUE)) |>
  ggplot(aes(x = reorder(DISTRITO, mean_ce), y = mean_ce,
             group = ses_predom, color = ses_predom)) +
  geom_pointrange(aes(ymin = mean_ce-sd_ce, ymax = mean_ce+sd_ce)) +
  scale_color_brewer(palette = 'RdYlGn', direction = -1) +
  labs(x = 'Distrito',
       y = 'Civic engagement (proportion and standard deviation)',
       color = 'Predominant \nSocioeconomic \nStatus') +
  coord_flip()

Code

lcv1019_ctx |>
    group_by(year_factor) |>
    summarize(year = year_factor,
              mean_ce = mean(civiceng_core, na.rm = TRUE),
              sd_ce = sd(civiceng_core, na.rm = TRUE)) |>
    ggplot(aes(x = year, y = mean_ce)) +
    geom_pointrange(aes(ymin = mean_ce-sd_ce, ymax = mean_ce+sd_ce)) + 
    xlab('Year') + ylab('Civic engagement (proportion and SD)')

A version of the restricted dataset is for the years 2010-2014. These years include a question about trust in neighbours. The distribution of these responses is approximately normal (see Figure 3). tbl-trust shows see the distribution of responses by year, while Figure 4 plots the means and standard deviations by district (colour indicates the predominant socioeconomic status of the district, A being the highest).

Code

temp = filter(lcv1019_ctx, year < 2015)
ggplot(temp, aes(trust_neighbours)) +
  geom_histogram(binwidth = 1)

Figure 3: Distribution of trust in neighbours

Code

data.frame(trust_neighbours = as.factor(temp$trust_neighbours),
           year = temp$year) |>
  group_by(trust_neighbours, year) |>
  tally() |>
  pivot_wider(names_from = trust_neighbours,
              values_from = n)

Table 1:

Trust in neighbours by year (0=low)

Code

lcv1019_ctx |>
  filter(year < 2015) |>
  mutate(DISTRITO = str_to_title(DISTRITO)) |>
  group_by(UBIGEO, DISTRITO) |>
  summarize(ses_predom = unique(NSE_PREDOM, na.rm = TRUE),
            mean_trust = mean(trust_neighbours, na.rm = TRUE),
            sd_trust = sd(trust_neighbours, na.rm = TRUE)) |>
  ggplot(aes(x = reorder(DISTRITO, mean_trust), y = mean_trust,
             group = ses_predom, color = ses_predom)) +
  geom_pointrange(aes(ymin = mean_trust-sd_trust,
                      ymax = mean_trust+sd_trust)) +
  scale_color_brewer(palette = 'RdYlGn', direction = -1) +
  labs(x = 'Distrito',
       y = 'Trust in neighbours (proportion and standard deviation)',
       color = 'Predominant \nSocioeconomic \nStatus') +
  coord_flip()

Figure 4: Trust in neighbours by district

In Figure 5 we can see that trust is not associated with most of the civic engagement variables.

Code

dat = filter(lcv1019_ctx, year < 2015) |>
  select(petition:participatory_budget,
         trust_neighbours)
vmat = PairApply(dat, CramerV) # Cramer's V instead of Pearson for precision

corrplot(vmat,
         type = 'upper',
         order = 'hclust',
         addCoef.col = 'grey50',
         tl.col = 'black',
         tl.srt = 45,
         tl.cex =0.8)

Figure 5: Association matrix of potential dependent variables (Cramer’s V)

Upon this exploration, and considering that I have aggregated data at the district level, I model civic engagement using multilevel logistic regression and trust in neighbours with multilevel linear regression.

Exploring independent variables

The list of independent variables is contingent on the years for the analysis. In Figure 6 I show the association between candidates for independent variables across all models. Some of them will be useful for just one of the dependent variables.

Code

dat2 = select(lcv1019_ctx, civiceng_core, year, female, age,
              ses_rev, safety_lima,safety_hood, victimization, 
              works_home, works_not, lowered, children_under15,
              ls, circuity_avg_mean, barriers_03_mean,
              change19902020_prop)
cmat = cor(dat2, method = 'pearson', use = 'complete.obs')
corrplot(cmat,
         type = 'upper',
         order = 'hclust',
         tl.col = 'black',
         tl.srt = 45,
         tl.cex =0.8)

Figure 6: Association among independent variables

Civic engagement models

I start with an intercept model to calculate the proportion of the variance that corresponds to within-district variation (Table 2).

mciviceng0 = glmer(civiceng_core ~ 1 + (1 | UBIGEO),
             data = lcv1019_ctx,
             family = binomial(link = 'logit'))
icc(mciviceng0)

Table 2:

Civic engagement’s intra-district variation (ICC)

While the ICC is low, it is the only available level of geographic aggregation to match the spatial data.

Preliminary models below. I still need to consider ways to organize these models. I could plot all models together for different buffers of barriers, or I could just choose one. They don’t seem to add too much difference and effect sizes are small. Also, consider a way to present models with interaction terms.

Code

# standardized variables including factors as dummies 
dat_ce = lcv1019_ctx |>
  select(civiceng_core, UBIGEO, ses_factor, year_factor, 
         age, female, victimization, safety_lima,
         safety_hood, old_timer, drive_or_taxi,
         works_home, barriers_03_mean, ls, change19902020_prop) |>
  dummy_cols(remove_first_dummy = TRUE, ignore_na = TRUE) |>
  mutate(across(age:change19902020_prop, ~ c(scale(.x))))|>
  select(-c(ses_factor, year_factor))

# list of independent variables
indVarsList = dat_ce |>
  select(-c(civiceng_core, UBIGEO)) |>
  colnames() |>
  paste(collapse = '+') 

# model

mciviceng1 = glmer(as.formula(paste('civiceng_core ~', indVarsList,
                                    '+ (1 | UBIGEO)')),
                   data = dat_ce,
                   family = binomial(link = 'logit'))

mciviceng2 = update(mciviceng1, . ~ . + barriers_03_mean:victimization + barriers_03_mean:old_timer)

# plot

plot_models(mciviceng1, mciviceng2, 
            vline.color = 'red',
            m.labels = c('Civic Engagement',
                         'Civic Engagement (interactions)'),
            axis.labels = c('Barriers 3m x Old Timer',
                            'Barriers 3m x Victimization',
                            'Year 2019 (vs. 2010)',
                            'Year 2018 (vs. 2010)',
                            'Year 2017 (vs. 2010)',
                            'Year 2016 (vs. 2010)',
                            'Year 2015 (vs. 2010)',
                            'Year 2014 (vs. 2010)',
                            'Year 2013 (vs. 2010)',
                            'Year 2012 (vs. 2010)',
                            'Year 2011 (vs. 2010)',
                            'Socioec. Lev. E (vs. A)',
                            'Socioec. Lev. D (vs. A)',
                            'Socioec. Lev. C (vs. A)',
                            'Socioec. Lev. B (vs. A)',
                            'Urbanized After 1990 (dist. prop.)',
                            'Local Segregation (dist.)',
                            'Barriers 3m (dist. mean)',
                            'Works from Home',
                            'Drives or Taxi to Work',
                            'Old Timer',
                            'Neighbourhood Safety',
                            'Lima Safety',
                            'Crime Victimization',
                            'Female',
                            'Age')) +
  ylim(.25, 1.75)

Code

plot_model(mciviceng2, type='pred',
           terms = c('barriers_03_mean[all]', 'old_timer'),
           title = 'Predicted probabilities of civic engagement') +
  scale_color_discrete(
    name = 'Old Timer',
    labels = c('No', 'Yes'))

Code

modelsummary(list(mciviceng1, mciviceng2),
             exponentiate = TRUE)

	(1)	(2)
(Intercept)	0.432	0.430
	(0.064)	(0.064)
age	1.329	1.327
	(0.034)	(0.034)
female	0.990	0.990
	(0.028)	(0.028)
victimization	1.292	1.293
	(0.035)	(0.035)
safety_lima	0.880	0.880
	(0.023)	(0.023)
safety_hood	1.064	1.064
	(0.029)	(0.029)
old_timer	1.068	1.077
	(0.030)	(0.031)
drive_or_taxi	1.047	1.046
	(0.027)	(0.027)
works_home	0.937	0.936
	(0.026)	(0.026)
barriers_03_mean	0.905	0.899
	(0.034)	(0.034)
ls	1.029	1.029
	(0.037)	(0.037)
change19902020_prop	1.031	1.028
	(0.032)	(0.032)
ses_factor_B	0.743	0.744
	(0.094)	(0.095)
ses_factor_C	0.745	0.747
	(0.098)	(0.098)
ses_factor_D	0.690	0.692
	(0.093)	(0.094)
ses_factor_E	0.794	0.793
	(0.142)	(0.142)
year_factor_2011	0.787	0.788
	(0.090)	(0.090)
year_factor_2012	0.694	0.694
	(0.083)	(0.084)
year_factor_2013	0.641	0.641
	(0.077)	(0.078)
year_factor_2014	0.852	0.852
	(0.096)	(0.096)
year_factor_2015	0.808	0.808
	(0.081)	(0.081)
year_factor_2016	0.649	0.650
	(0.078)	(0.078)
year_factor_2017	0.619	0.621
	(0.077)	(0.077)
year_factor_2018	0.526	0.526
	(0.055)	(0.055)
year_factor_2019	0.571	0.572
	(0.059)	(0.059)
victimization × barriers_03_mean		1.002
		(0.025)
old_timer × barriers_03_mean		1.051
		(0.031)
SD (Intercept UBIGEO)	1.150	1.149
Num.Obs.	11429	11429
R2 Marg.	0.069	0.070
R2 Cond.	0.074	0.075
AIC	10876.7	10877.8
BIC	11067.6	11083.4
ICC	0.0	0.0
RMSE	0.39	0.39

Code

dat_ce_desc = lcv1019_ctx |> # -c(year_factor, UBIGEO)
  select(civiceng_core, ses_factor, age, female, 
         victimization, safety_lima, safety_hood,
         old_timer, drive_or_taxi, works_home,
         barriers_03_mean, ls, change19902020_prop) 
datasummary_balance(~1, data = dat_ce_desc,
                    fmt = 2)

		Mean	Std. Dev.
civiceng_core		0.19	0.39
age		40.05	16.63
female		0.51	0.50
victimization		0.69	0.46
safety_lima		1.22	1.05
safety_hood		1.40	1.12
old_timer		0.86	0.34
drive_or_taxi		0.09	0.29
works_home		0.33	0.47
barriers_03_mean		22.35	16.50
ls		0.15	0.20
change19902020_prop		0.08	0.16
		N	Pct.
ses_factor	A	889	4.6
	B	4850	25.3
	C	7873	41.0
	D	5024	26.2
	E	558	2.9

The results above use only the district-level average of barriers at 3km buffer as independent variable of interest and not all other buffer levels. The reason is that models do not differ that much from one another.

Trust in neighbours models

The next variable to model is trust in neighbours. First, I fit an intercept only model to calculate the intra-class correlation (ICC) (Table 3).

dat_trust = filter(lcv1019_ctx, year < 2015) |>
  select(UBIGEO, trust_neighbours, ses_factor, year_factor, 
         age, female, victimization, safety_lima, safety_hood, 
         old_timer, drive_or_taxi, works_home, lowered,
         children_under15,
         barriers_03_mean, ls, change19902020_prop) |>
  dummy_cols(remove_first_dummy = TRUE, ignore_na = TRUE) |>
  mutate(across(age:change19902020_prop, ~ c(scale(.x))))|>
  select(-c(ses_factor, year_factor))

mtrust0 = lmer(trust_neighbours ~ 1 + (1 | UBIGEO), data = dat_trust)

icc(mtrust0)

Table 3:

Trust in neghbours’ intra-district variance (ICC)

The ICC is low but higher than the one for civic engagement. The within-district variability suggests the effects of built environment features may be higher.

Code

# model

indVarsList2 = dat_trust |>
  select(-c(trust_neighbours, UBIGEO)) |>
  colnames() |>
  paste(collapse = '+') 


mtrust1 = lmer(as.formula(paste('trust_neighbours ~',indVarsList2,
                                '+ (1 | UBIGEO)')),
                 data = dat_trust)

mtrust2 = update(mtrust1, . ~ . + barriers_03_mean:safety_hood + barriers_03_mean:old_timer)

# plot

plot_models(mtrust1, mtrust2, 
            vline.color = 'red',
            m.labels = c('Trust in neighbours',
                         'Trust in neighbours (interaction)'),
            axis.labels = c('Barriers 3m x Old Timer',
                            'Barriers 3m x Nb. Safety',
                            'Year 2014 (vs. 2010)',
                            'Year 2013 (vs. 2010)',
                            'Year 2012 (vs. 2010)',
                            'Year 2011 (vs. 2010)',
                            'Socioec. Lev. E (vs. A)',
                            'Socioec. Lev. D (vs. A)',
                            'Socioec. Lev. C (vs. A)',
                            'Socioec. Lev. B (vs. A)',
                            'Urbanized After 1990 (dist. prop.)',
                            'Local Segregation (dist.)',
                            'Barriers 3m (dist. mean)',
                            'Children under 15',
                            'Below High School',
                            'Works from Home',
                            'Drives or Taxi to Work',
                            'Old Timer',
                            'Neighbourhood Safety',
                            'Lima Safety',
                            'Crime Victimization',
                            'Female',
                            'Age'))

Code

plot_model(mtrust2, type='pred',
           terms = c('barriers_03_mean[all]', 'safety_hood'),
           title = 'Predicted values of trust in neighbours',
           legend.title = 'Neighbourhood safety') +
  scale_color_discrete(labels = c('0','1', '2', '3', '4'))

Code

modelsummary(list(mtrust1, mtrust2))

	(1)	(2)
(Intercept)	1.931	1.925
	(0.104)	(0.104)
age	0.076	0.075
	(0.019)	(0.019)
female	−0.051	−0.052
	(0.018)	(0.018)
victimization	−0.008	−0.008
	(0.016)	(0.016)
safety_lima	0.077	0.075
	(0.017)	(0.017)
safety_hood	0.149	0.154
	(0.019)	(0.019)
old_timer	0.023	0.023
	(0.018)	(0.018)
drive_or_taxi	−0.025	−0.026
	(0.018)	(0.018)
works_home	−0.040	−0.039
	(0.018)	(0.018)
lowered	−0.069	−0.068
	(0.018)	(0.018)
children_under15	−0.029	−0.029
	(0.017)	(0.017)
barriers_03_mean	0.045	0.036
	(0.027)	(0.027)
ls	0.021	0.020
	(0.026)	(0.026)
change19902020_prop	0.080	0.077
	(0.022)	(0.022)
ses_factor_B	−0.208	−0.200
	(0.102)	(0.102)
ses_factor_C	−0.302	−0.292
	(0.102)	(0.102)
ses_factor_D	−0.348	−0.337
	(0.103)	(0.103)
ses_factor_E	−0.601	−0.592
	(0.128)	(0.128)
year_factor_2011	0.051	0.050
	(0.051)	(0.051)
year_factor_2012	0.206	0.203
	(0.051)	(0.051)
year_factor_2013	−0.018	−0.021
	(0.052)	(0.052)
year_factor_2014	0.175	0.173
	(0.049)	(0.049)
safety_hood × barriers_03_mean		0.044
		(0.017)
old_timer × barriers_03_mean		0.004
		(0.016)
SD (Intercept UBIGEO)	0.114	0.114
SD (Observations)	1.018	1.017
Num.Obs.	4207	4207
R2 Marg.	0.088	0.090
R2 Cond.	0.099	0.101
AIC	12259.1	12269.1
BIC	12411.4	12434.0
ICC	0.0	0.0
RMSE	1.01	1.01

Code

dat_trust_desc = filter(lcv1019_ctx, year < 2015) |>
  select(trust_neighbours, ses_factor, age, female, 
         victimization, safety_lima, safety_hood, 
         old_timer, drive_or_taxi, works_home, lowered,
         children_under15, barriers_03_mean, ls,
         change19902020_prop)
datasummary_balance(~1, data = dat_trust_desc,
                    fmt = 2)

		Mean	Std. Dev.
trust_neighbours		1.76	1.07
age		39.98	16.69
female		0.51	0.50
victimization		0.76	0.43
safety_lima		1.24	1.08
safety_hood		1.39	1.13
old_timer		0.82	0.39
drive_or_taxi		0.10	0.30
works_home		0.32	0.47
lowered		0.49	0.50
children_under15		0.47	0.50
barriers_03_mean		23.10	16.60
ls		0.16	0.21
change19902020_prop		0.07	0.15
		N	Pct.
ses_factor	A	324	3.4
	B	1646	17.1
	C	4346	45.3
	D	2998	31.2
	E	286	3.0