Twenty-Five Things You Didn’t Know You Could Do with R

Twenty-Five Things You Didn’t Know You Could Do with R

  1. Access Data Automagically
  2. Efficiently Make Beautiful Data Viz
  3. Make Maps
  4. Report in New Ways with Quarto
  5. Automate all the Things
  6. Use AI to Write Better Code
  7. Use AI to Analyze Data

https://rfortherestofus.com/cascadia2025

Access Data Automagically

Pull in Data Directly from Google Sheets

1

library(googlesheets4)

survey_data <-
  read_sheet(sheet = "YOURSHEETURLHERE")

Spring 2024

Fall 2024

Pull in Data Directly from Qualtrics

2

library(qualtRics)

survey_data <-
  fetch_survey(surveyID = "YOURSURVEYIDHERE")

Pull in Data Directly from the Census Bureau

3

library(tidycensus)

get_acs(
  state = "OR",
  geography = "place",
  geometry = TRUE,
  variables = "B01003_001"
)
Simple feature collection with 426 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.5125 ymin: 41.99277 xmax: -116.9354 ymax: 46.20566
Geodetic CRS:  NAD83
First 10 features:
     GEOID                      NAME   variable estimate moe
1  4149600       Monroe city, Oregon B01003_001      853 272
2  4181300      Wheeler city, Oregon B01003_001      429 137
3  4150050       Mosier city, Oregon B01003_001      630 262
4  4176250        Unity city, Oregon B01003_001       37  20
5  4133250        Helix city, Oregon B01003_001      358 185
6  4149150     Mitchell city, Oregon B01003_001      202 107
7  4161700     Richland city, Oregon B01003_001      199  74
8  4129950    Gold Hill city, Oregon B01003_001     1261 270
9  4153150 North Plains city, Oregon B01003_001     3365  43
10 4104000       Barlow city, Oregon B01003_001      157 105
                         geometry
1  MULTIPOLYGON (((-123.3067 4...
2  MULTIPOLYGON (((-123.9055 4...
3  MULTIPOLYGON (((-121.4047 4...
4  MULTIPOLYGON (((-118.1944 4...
5  MULTIPOLYGON (((-118.6616 4...
6  MULTIPOLYGON (((-120.1641 4...
7  MULTIPOLYGON (((-117.1722 4...
8  MULTIPOLYGON (((-123.0729 4...
9  MULTIPOLYGON (((-123.0114 4...
10 MULTIPOLYGON (((-122.7278 4...

Work with APIs to Access Data

4

library(httr2)

fathom_api_key <- Sys.getenv("FATHOM_API_KEY")

request("https://api.usefathom.com/v1/aggregations") |>
  req_url_query(
    entity = "pageview",
    aggregates = "visits,uniques,pageviews",
    sort_by = "visits:desc"
  ) |>
  req_headers(
    Authorization = str_glue("Bearer {fathom_api_key}")
  ) |>
  req_perform()
# A tibble: 28,976 × 5
   url                                       date       visits uniques pageviews
   <glue>                                    <date>      <dbl>   <dbl>     <dbl>
 1 /2018/06/the-life-changing-magic-of-r/    2021-03-01      4       9         9
 2 /2018/07/r-handles-the-beast-and-the-bea… 2021-03-01      1       3         3
 3 /2018/09/making-small-multiples-in-r/     2021-03-01     45      72        99
 4 /2018/12/descriptive-stats-r/             2021-03-01     11      47        52
 5 /2019/01/reproducibility-for-the-rest-of… 2021-03-01      1      36        41
 6 /2019/03/my-r-journey-dana-wanzer/        2021-03-01      2      12        12
 7 /2019/03/r-killer-feature-rmarkdown/      2021-03-01     37     216       258
 8 /2019/04/curb-cuts-universal-design-welc… 2021-03-01      8      18        20
 9 /2019/04/my-r-journey-david-keyes/        2021-03-01     23      78        94
10 /2019/04/my-r-journey-rika-gorn/          2021-03-01      0       3         3
# ℹ 28,966 more rows

Scrape Data

5

library(rvest)
library(tidyverse)

read_html("https://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals") |>
  html_elements("table") |>
  pluck(4) |>
  html_table() |>
  select(-Ref.)
# A tibble: 23 × 7
    Year Winners      Score        `Runners-up`   Venue      Location Attendance
   <int> <chr>        <chr>        <chr>          <chr>      <chr>    <chr>     
 1  1930 Uruguay      4–2          Argentina      Estadio C… Montevi… 68,346    
 2  1934 Italy        2–1 (a.e.t.) Czechoslovakia Stadio Na… Rome, I… 55,000    
 3  1938 Italy        4–2          Hungary        Stade Oly… Paris, … 45,000    
 4  1950 Uruguay      2–1[n 3]     Brazil         Maracanã … Rio de … 173,850   
 5  1954 West Germany 3–2          Hungary        Wankdorf … Bern, S… 62,500    
 6  1958 Brazil       5–2          Sweden         Råsunda S… Solna, … 49,737    
 7  1962 Brazil       3–1          Czechoslovakia Estadio N… Santiag… 68,679    
 8  1966 England      4–2 (a.e.t.) West Germany   Wembley S… London,… 96,924    
 9  1970 Brazil       4–1          Italy          Estadio A… Mexico … 107,412   
10  1974 West Germany 2–1          Netherlands    Olympiast… Munich,… 78,200    
# ℹ 13 more rows

Efficiently Make Beautiful Data Viz

Make Your Own ggplot Theme

6

penguins_bar_chart

library(tidyverse)

theme_dk <- function(base_family = "Inter Tight", base_size = 14) {
  theme_minimal(base_size = base_size, base_family = base_family) +
    theme(
      panel.grid.minor = element_blank(),
      panel.grid.major = element_line(
        color = "grey90",
        linewidth = 0.5,
        linetype = "dashed"
      ),
      axis.text = element_text(
        color = "grey50"
      ),
      ETC
    )
}
penguins_bar_chart +
  theme_dk()

Use Your Theme Everywhere

7

theme_set(theme_dk())
---
title: "Penguins Report"
---

```{r}
theme_dk <- function(base_family = "Inter Tight", base_size = 14) {}
```

```{r}
theme_set(theme_dk())
```

```{r}
penguins_bar_chart
```

Make Your Text Consistent with Your Theme

8

penguins_bar_chart +
  theme_dk(base_family = "IBM Plex Mono", base_size = 12) +
  geom_text(
    aes(label = avg_bill_length_formatted),
    vjust = 1.5,
    color = "white",
    size = 4
  )

penguins_bar_chart +
  theme_dk(base_family = "IBM Plex Mono", base_size = 12) +
  geom_text(
    aes(label = avg_bill_length_formatted),
    vjust = 1.5,
    color = "white",
    family = "IBM Plex Mono",
    size = 4
  )

update_geom_defaults(
  geom = "text",
  aes(family = "IBM Plex Mono")
)
---
title: "Penguins Report"
---

```{r}
theme_dk <- function()
```

```{r}
theme_set(theme_dk())

update_geom_defaults(geom = "text", aes(family = "IBM Plex Mono"))
```

```{r}
penguins_bar_chart +
  geom_text(
    aes(label = avg_bill_length_formatted),
    vjust = 1.5,
    color = "white",
    size = 4
  )
```

Make Maps

Make Maps with ggplot

9

library(tidycensus)

median_income_by_county <-
  get_acs(
    geography = "county",
    variables = c(median_income = "B19013_001"),
    geometry = TRUE
  )
median_income_by_county
Simple feature collection with 3222 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1467 ymin: 17.88328 xmax: 179.7785 ymax: 71.38782
Geodetic CRS:  NAD83
First 10 features:
   GEOID                           NAME      variable estimate   moe
1  01069        Houston County, Alabama median_income    55064  1412
2  01023        Choctaw County, Alabama median_income    43299  8544
3  01005        Barbour County, Alabama median_income    39712  3289
4  01107        Pickens County, Alabama median_income    45339  2865
5  01033        Colbert County, Alabama median_income    56149  3200
6  04012         La Paz County, Arizona median_income    46634  2584
7  04001         Apache County, Arizona median_income    37483  3091
8  05081  Little River County, Arkansas median_income    58627 11485
9  05121      Randolph County, Arkansas median_income    45993  4649
10 06037 Los Angeles County, California median_income    83411   439
                         geometry
1  MULTIPOLYGON (((-85.71209 3...
2  MULTIPOLYGON (((-88.47323 3...
3  MULTIPOLYGON (((-85.74803 3...
4  MULTIPOLYGON (((-88.34043 3...
5  MULTIPOLYGON (((-88.13925 3...
6  MULTIPOLYGON (((-114.7312 3...
7  MULTIPOLYGON (((-110.0007 3...
8  MULTIPOLYGON (((-94.48558 3...
9  MULTIPOLYGON (((-91.40687 3...
10 MULTIPOLYGON (((-118.6044 3...
library(tidyverse)
library(tigris)
library(scales)

median_income_by_county |>
  shift_geometry() |>
  ggplot(aes(fill = estimate)) +
  geom_sf(linewidth = 0) +
  scale_fill_viridis_c(
    option = "B",
    labels = dollar_format(),
    name = NULL
  ) +
  theme_void()

Do Geospatial Analysis

10

Which Elementary Schools in Portland are Within One Mile of a Public Library?

library(sf)

portland_libraries <- read_sf("data/portland_libraries.geojson")
Simple feature collection with 21 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.835 ymin: 45.44795 xmax: -122.479 ymax: 45.59008
Geodetic CRS:  WGS 84
# A tibble: 21 × 2
   library                                  geometry
   <chr>                                 <POINT [°]>
 1 West Slope Community Library (-122.7573 45.49324)
 2 St. Johns Library            (-122.7512 45.59008)
 3 North Portland Library       (-122.6715 45.56252)
 4 Sellwood-Moreland Library     (-122.6528 45.4677)
 5 Gregory Heights Library      (-122.5814 45.55157)
 6 Rockwood Library              (-122.479 45.51949)
 7 Midland Library              (-122.5382 45.51677)
 8 Hollywood Library            (-122.6216 45.53763)
 9 Belmont Library              (-122.6227 45.51524)
10 Central Library               (-122.683 45.51922)
# ℹ 11 more rows
portland_libraries_one_mile_buffer <-
  portland_libraries |>
  st_buffer(1609.34)
Simple feature collection with 21 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -122.8561 ymin: 45.43334 xmax: -122.4579 ymax: 45.60477
Geodetic CRS:  WGS 84
# A tibble: 21 × 2
   library                                                              geometry
 * <chr>                                                           <POLYGON [°]>
 1 West Slope Community Library ((-122.7586 45.4787, -122.7584 45.47877, -122.7…
 2 St. Johns Library            ((-122.7652 45.60085, -122.7654 45.60078, -122.…
 3 North Portland Library       ((-122.6812 45.57536, -122.6813 45.57533, -122.…
 4 Sellwood-Moreland Library    ((-122.667 45.45705, -122.6669 45.45693, -122.6…
 5 Gregory Heights Library      ((-122.6022 45.55156, -122.6024 45.5515, -122.6…
 6 Rockwood Library             ((-122.4592 45.52356, -122.459 45.52363, -122.4…
 7 Midland Library              ((-122.5348 45.53109, -122.5349 45.53106, -122.…
 8 Hollywood Library            ((-122.6421 45.53946, -122.6424 45.53935, -122.…
 9 Belmont Library              ((-122.6021 45.51627, -122.6019 45.51633, -122.…
10 Central Library              ((-122.6833 45.53382, -122.6835 45.53375, -122.…
# ℹ 11 more rows
pps_elementary_schools <-
  read_sf("data/pps_elementary_schools.geojson")
Simple feature collection with 86 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.893 ymin: 45.44022 xmax: -122.4617 ymax: 45.65579
Geodetic CRS:  WGS 84
# A tibble: 86 × 2
   school                                           geometry
   <chr>                                         <POINT [°]>
 1 Lenox Elementary School               (-122.893 45.55811)
 2 Cedar Mill Elementary School         (-122.7827 45.52651)
 3 Montclair Elementary School          (-122.7559 45.47644)
 4 Raleigh Hills Elementary School      (-122.7587 45.48163)
 5 Raleigh Park Elementary School       (-122.7576 45.49301)
 6 Ridgewood Elementary School          (-122.7786 45.50341)
 7 Rock Creek Elementary School         (-122.8676 45.55109)
 8 Terra Linda Elementary School        (-122.8244 45.53416)
 9 West Tualatin View Elementary School (-122.7662 45.51553)
10 William Walker Elementary School     (-122.7982 45.50076)
# ℹ 76 more rows
pps_elementary_schools |>
  st_join(portland_libraries_one_mile_buffer)
Simple feature collection with 89 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.893 ymin: 45.44022 xmax: -122.4617 ymax: 45.65579
Geodetic CRS:  WGS 84
# A tibble: 89 × 3
   school                                           geometry library            
 * <chr>                                         <POINT [°]> <chr>              
 1 Lenox Elementary School               (-122.893 45.55811) <NA>               
 2 Cedar Mill Elementary School         (-122.7827 45.52651) <NA>               
 3 Montclair Elementary School          (-122.7559 45.47644) Garden Home Commun…
 4 Raleigh Hills Elementary School      (-122.7587 45.48163) West Slope Communi…
 5 Raleigh Park Elementary School       (-122.7576 45.49301) West Slope Communi…
 6 Ridgewood Elementary School          (-122.7786 45.50341) <NA>               
 7 Rock Creek Elementary School         (-122.8676 45.55109) <NA>               
 8 Terra Linda Elementary School        (-122.8244 45.53416) <NA>               
 9 West Tualatin View Elementary School (-122.7662 45.51553) Oregon College of …
10 William Walker Elementary School     (-122.7982 45.50076) <NA>               
# ℹ 79 more rows
pps_elementary_schools_near_libraries <-
  pps_elementary_schools |>
  st_join(portland_libraries_one_mile_buffer) |>
  mutate(has_nearby_library = case_when(
    is.na(library) ~ "Not within one mile of library",
    .default = "Within one mile of library"
  )) |>
  select(school, has_nearby_library)
Simple feature collection with 89 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.893 ymin: 45.44022 xmax: -122.4617 ymax: 45.65579
Geodetic CRS:  WGS 84
# A tibble: 89 × 3
   school                           has_nearby_library             geometry
   <chr>                            <chr>                       <POINT [°]>
 1 Lenox Elementary School          Not within one mi…  (-122.893 45.55811)
 2 Cedar Mill Elementary School     Not within one mi… (-122.7827 45.52651)
 3 Montclair Elementary School      Within one mile o… (-122.7559 45.47644)
 4 Raleigh Hills Elementary School  Within one mile o… (-122.7587 45.48163)
 5 Raleigh Park Elementary School   Within one mile o… (-122.7576 45.49301)
 6 Ridgewood Elementary School      Not within one mi… (-122.7786 45.50341)
 7 Rock Creek Elementary School     Not within one mi… (-122.8676 45.55109)
 8 Terra Linda Elementary School    Not within one mi… (-122.8244 45.53416)
 9 West Tualatin View Elementary S… Within one mile o… (-122.7662 45.51553)
10 William Walker Elementary School Not within one mi… (-122.7982 45.50076)
# ℹ 79 more rows

Make Interactive Maps

11

library(mapgl)

maplibre(bounds = pps_elementary_schools_near_libraries)
maplibre(bounds = pps_elementary_schools_near_libraries) |>
  add_fill_layer(
    source = portland_libraries_one_mile_buffer,
    fill_color = "#7570b3",
    fill_opacity = 0.5,
    tooltip = "library",
    id = "portland_libraries"
  )
maplibre(bounds = pps_elementary_schools_near_libraries) |>
  add_circle_layer(
    source = pps_elementary_schools_near_libraries,
    circle_color = match_expr(
      "has_nearby_library",
      values = c(
        "Within one mile of library",
        "Not within one mile of library"
      ),
      stops = c(
        "#1b9e77",
        "#d95f02"
      )
    ),
    tooltip = "school",
    id = "schools"
  )
maplibre(bounds = pps_elementary_schools_near_libraries) |>
  add_categorical_legend(
    values = c(
      "Within one mile of library",
      "Not within one mile of library"
    ),
    legend_title = NULL,
    colors = c(
      "#1b9e77",
      "#d95f02"
    ),
    circular_patches = TRUE
  )
maplibre(bounds = pps_elementary_schools_near_libraries) |>
  add_fill_layer(
    source = portland_libraries_one_mile_buffer,
    fill_color = "#7570b3",
    fill_opacity = 0.5,
    tooltip = "library",
    id = "portland_libraries"
  ) |>
  add_circle_layer(
    source = pps_elementary_schools_near_libraries,
    circle_color = match_expr(
      "has_nearby_library",
      values = c(
        "Within one mile of library",
        "Not within one mile of library"
      ),
      stops = c(
        "#1b9e77",
        "#d95f02"
      )
    ),
    tooltip = "school",
    id = "schools"
  ) |>
  add_categorical_legend(
    values = c(
      "Within one mile of library",
      "Not within one mile of library"
    ),
    legend_title = NULL,
    colors = c(
      "#1b9e77",
      "#d95f02"
    ),
    circular_patches = TRUE
  )

Report in New Ways with Quarto

Make Many Different Outputs with Quarto

12

---
title: "Median Income Report"
format: html
---
---
title: "Median Income Presentation"
format: revealjs
---

index.qmd

---
title: "Median Income"
---

about.qmd

---
title: "About"
---

_quarto.yml

project:
  type: website

website:
  title: "Median Income Website"
  navbar:
    left:
      - index.qmd
      - about.qmd
  
format: html

Keep Your Quarto Outputs on Brand

13

_brand.yml

color:
  foreground: "#404e6b"
  primary: "#6cabdd"

typography:
  fonts:
    - family: Inter Tight
      source: google
  base: Inter Tight
  headings: Inter Tight

Publish Your Quarto Documents Online

14

https://rfortherestofus.com/cascadia2025

Make PDFs with Typst

15

report.qmd

---
title: "Housing Data Profiles"

format: 
  typst:
    template-partials:
      - typst-show.typ
      - typst-template.typ

params:
  town: "Hartford"
---

# Introduction

Consequat occaecat mollit velit aliquip. etc ...

typst-show.typ

#show: psc-report.with(
  $if(title)$
    title: "$title$",
  $endif$
  $if(params.town)$
    town: "$params.town$",
  $endif$
)

typst-template.typ

#let psc-report(
  title: "title",
  town : "town",
  body,
) = {

 set text(
    font: "Open Sans",
    size: 12pt,
  )

 set page(
    "us-letter",
    margin: (left: 1in, 
             right: 1in, 
             top: 0.7in, 
             bottom: 1in),
    background: place(top, 
                      rect(fill: rgb("15397F"), 
                           width: 100%, 
                           height: 0.5in)),
    header: align(
      horizon,
      grid(
        columns: (80%, 20%),
        align(left, text(size: 20pt, fill: white, weight: "bold", title)),
        align(right, text(size: 12pt, fill: white, weight: "bold", town)),
      ),
    ),
    footer: align(
      grid(
        columns: (40%, 60%),
        align(horizon, text(fill: rgb("15397F"), 
                            size: 12pt, 
                            counter(page).display("1"))),
        align(right, image("assets/psclogo.svg", height: 300%)),
      )
    )
  )

  body
}

Automate all the Things

Email Your Reports Directly from R

16

library(gmailr)

gm_auth_configure()
gm_auth(email = TRUE, cache = ".secret")

email_report <-
  gm_mime() |>
  gm_to("Joe Schmoe <joeschmoe@prosperportland.us>") |>
  gm_from("David Keyes <david@rfortherestofus.com>") |>
  gm_subject("COVID Business Relief Contact Log") |>
  gm_text_body("See attached") |>
  gm_attach_file("grant-report.html")

gm_send_message(email_report)

Run Your Code Without Lifting a Finger

17

send-report.yaml

name: Render Report and Send It

on:
  schedule:
    - cron:  '00 14 * * 1-5'

jobs:
  build:
    runs-on: ubuntu-latest
    
    env:
      GMAILR_APP: ${{ secrets.GMAILR_APP }}
      GMAILR_EMAIL: ${{ secrets.GMAILR_EMAIL }}

send-report.yaml (continued)

   steps:
      - name: Checkout Repository
        uses: actions/checkout@v2

      - name: Install dependencies
        run: |
          install.packages("remotes")
          remotes::install_cran("gmailr")
          ETC
        shell: Rscript {0}
          
      - name: Render + Send
        run: |-
          Rscript render_gmail.R

Use R to Work With Files Created in R

18

library(fs)
library(zip)

county_pages <- dir_ls("outputs/pages/county/")

measure_pages <- dir_ls("outputs/pages/measure/")

zip(
  zipfile = str_glue("outputs/zip/obtn-files.zip"),
  files = c(county_pages, measure_pages)
)
library(googledrive)

drive_upload("outputs/zip/obtn-files.zip")

Use AI to Write Better Code

Create Custom Instructions

19

Please answer the following R question.

When I program, I always like to use the tidyverse.

Please don’t ever give me base R solutions.

Use AI Directly in your Code Editor

20

Show AI Your Data

21

Data science differs a bit from software engineering … in that the state of your R environment is just as important (or more so) than the contents of your files.

── system ──────────────────
You are a helpful but terse R data scientist. 
Respond only with valid R code: no exposition, no backticks. 
Always provide a minimal solution and refrain from unnecessary additions. 
Use tidyverse style and, when relevant, tidyverse packages. 
For example, when asked to plot something, use ggplot2, 
or when asked to transform data, using dplyr and/or tidyr unless 
explicitly instructed otherwise. 
── user ────────────────────
Up to this point, the contents of my r file reads: 
  
library(tidyverse)
library(sf)
library(gander)

portland_boundaries <- read_sf("data/portland_boundaries.geojson")
pps_elementary_schools <- read_sf("data/pps_elementary_schools.geojson")

Now, Write code to tell me which schools are within Portland boundaries 
Here's some information about the objects in my R environment: 

```
# Just the first 1 row and 6 columns:
portland_boundaries
#> sf [1 × 6] (S3: sf/tbl_df/tbl/data.frame)
#>  $ objectid    : int 35
#>  $ cityname    : chr "Portland"
#>  $ shape_length: num 277321
#>  $ shape_area  : num 7.66e+08
#>  $ area        : num 4.05e+09
#>  $ geometry    :sfc_MULTIPOLYGON of length 1; first list element: List of 4
#>   ..$ :List of 13
#>   .. ..$ : num [1:14525, 1:2] -123 -123 -123 -123 -123 ...
#>   .. ..$ : num [1:642, 1:2] -123 -123 -123 -123 -123 ...
#>   .. ..$ : num [1:118, 1:2] -123 -123 -123 -123 -123 ...
#>   .. ..$ : num [1:5, 1:2] -122 -122 -122 -122 -122 ...
#>   .. ..$ : num [1:4, 1:2] -122.5 -122.5 -122.5 -122.5 45.5 ...
#>   .. ..$ : num [1:676, 1:2] -123 -123 -123 -123 -123 ...
#>   .. ..$ : num [1:11, 1:2] -122 -122 -122 -122 -122 ...
#>   .. ..$ : num [1:20, 1:2] -123 -123 -123 -123 -123 ...
#>   .. ..$ : num [1:4, 1:2] -122.7 -122.7 -122.7 -122.7 45.5 ...
#>   .. ..$ : num [1:10, 1:2] -122 -122 -122 -122 -122 ...
#>   .. ..$ : num [1:28, 1:2] -123 -123 -123 -123 -123 ...
#>   .. ..$ : num [1:57, 1:2] -123 -123 -123 -123 -123 ...
#>   .. ..$ : num [1:75, 1:2] -123 -123 -123 -123 -123 ...
#>   ..$ :List of 1
#>   .. ..$ : num [1:6, 1:2] -123 -123 -123 -123 -123 ...
#>   ..$ :List of 1
#>   .. ..$ : num [1:189, 1:2] -123 -123 -123 -123 -123 ...
#>   ..$ :List of 1
#>   .. ..$ : num [1:5, 1:2] -123 -123 -123 -123 -123 ...
#>   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
#>  - attr(*, "sf_column")= chr "geometry"
#>  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
#>   ..- attr(*, "names")= chr [1:5] "objectid" "cityname" "shape_length" "shape_area" ...

# Just the first 5 rows and 2 columns:
pps_elementary_schools
#> sf [5 × 2] (S3: sf/tbl_df/tbl/data.frame)
#>  $ school  : chr [1:5] "Lenox Elementary School" "Cedar Mill Elementary School" "Montclair Elementary School" "Raleigh Hills Elementary School" ...
#>  $ geometry:sfc_POINT of length 5; first list element:  'XY' num [1:2] -122.9 45.6
#>  - attr(*, "sf_column")= chr "geometry"
#>  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
#>   ..- attr(*, "names")= chr "school"

Use AI for Data Analysis

Translate Text

22

library(tidyverse)

survey_spanish <-
  read_csv("data/survey_spanish.csv")
library(mall)

survey_translated <-
  survey_spanish |>
  llm_translate(spanish, language = "English", pred_name = "english")

Summarize Text

23

survey_translated_summary <-
  survey_translated |>
  llm_summarize(english, max_words = 5, pred_name = "summary")

Create your Own Prompt to Analyze Text

24

library(ellmer)

identify_themes <- function(text) {
  chat <- chat_openai(
    system_prompt = "You are a sociologist,
    looking for the top three themes in the responses to a survey.
    Each response is separated by \n"
  )

  chat$chat(text)
}
survey_translated_combined <-
  survey_translated |>
  pull(english) |>
  paste(collapse = "\n")
[1] "Allows the handling of multiple statistical procedures that are difficult to perform in other programs.\nThe power and multitude of packages available for almost anything\nThe community of users (there's always an answer to every problem you find)\nThere are many development options available with good documentation.\nThe flexibility at the time of data management. In R, it's not what you can do with the data, but how.\nAllows me to process data of any type with sufficient accuracy, in addition to allowing exploration of creativity through report design using markdown.\nI like speed with which I can solve a problem and feel sure that I am doing it correctly, because many professional experts have invested time in the packages and solutions.\nThe ease of tidyverse, R Markdown, the diversity of packages, options, a large community of support, and data visualization tools.\nThe possibility of performing various tasks with a single software that is free and has a large and generous community of users sharing their knowledge."
survey_translated_combined |>
  identify_themes()
Based on the survey responses provided, the top three themes appear to be:

1. **Open Source and Community Contributions**: Many responses highlight the benefits of R being open-source, emphasizing the cost-free access to robust 
data analysis tools and frequent updates driven by community contributions. The active community also offers a wealth of packages for various tasks, which
enhances the flexibility and capability of the software.

2. **Learning and Accessibility**: While there is mention of a steep learning curve initially, respondents note that learning R becomes easier over time. The diversity and availability of resources make it possible to quickly achieve results, encouraging continued learning and exploration.


3. **Functionality and Efficiency**: Respondents appreciate the range of statistical procedures that R can handle with ease, as well as the speed and efficiency in managing large datasets. The power of R's data structure, which aligns with mathematical concepts, and the wide variety of packages further contribute to its functionality.

Join the Community

25