Skip to contents

Produces exploratory visualisations from LDA models, including bar charts for top terms per topic, diff terms for each topic against every other topic, bigram networks per topic. Also produces data frames for exemplars which shows example posts per topic and their URLs for verbatim finding, and probabilities which shows the probability of each topic for each post.

Usage

explore_LDAs(
  ldas,
  top_terms = TRUE,
  top_terms_style = c("lollipops", "bars"),
  top_n = 15,
  nrow = 2,
  diff_terms = TRUE,
  diff_n = 10,
  bigrams = TRUE,
  bigram_n = 50,
  bigram_threshold = 0.5,
  exemplars = TRUE,
  exemplars_n = 50,
  probabilities = TRUE,
  coherence = TRUE,
  coherence_n = 10
)

Arguments

ldas

A nested tibble where each row contains and LDA model.

top_terms

Should bar charts of the top terms for each topic be produced?

top_terms_style

Should the `top_terms` plots be lollipop charts or bars? The default is lollipops. Call "bars" for standard bar charts.

top_n

How many terms should be included in the top terms bar chart for each topic?

nrow

How many rows should the facetted plots go across?

diff_terms

Should bar charts of distinguishing terms for each topic be produced?

diff_n

How many terms should be included in the distinguishing terms bar chart for each topic?

bigrams

Should a bigram network be produced for each topic?

bigram_n

How many bigrams should be included in the network?

bigram_threshold

What proportion of a post should be assigned to a topic for it to be considered for inclusion in the bigram network?

exemplars

Should a tibble of exemplar posts be produced?

exemplars_n

How many exemplars should be provided for each topic?

probabilities

Should a tibble of document-topic probabilities be produced?

coherence

Should a tibble of topic coherence measurements be produced?

coherence_n

Number of words to be used in coherence calculation.

Value

A nested tibble containing columns with the requested objects.

Details

The individual plotting functions which are called by explore_LDAs are available for use separately. This can be beneficial when you have a lot of data, or you want to iterate quickly but not use the bigrams charts for example as they can take a long time to render.

The top terms charts are split between 'all_terms' and 'max_only' wherein the former allows each term to be in multiple topics and the latter does not - each term may only appear in one topic.

Coherence can be calculated in fit_LDAs and there's no reason to wait until the explore step of the pipeline, but we've kept the argument in explore_LDAs too for now as 1) it runs fast and 2) we want to ensure backwards compatibility.

For an interactive exploration of the topics, see the `shiny_topics_explore` function, which takes an explore object as input.

Examples

library(SegmentR)
library(purrr)
ldas <- SegmentR:::test_data(explore = FALSE)$lda
#> removing stopwords
#> Making DTMs
#> making tuning grid
#> setting up LDAs

#Don't render bigrams as they take a while:
explore <- explore_LDAs(ldas, bigrams = FALSE)
#View various objects
explore %>% pluck("probabilities", 4)
#> # A tibble: 100 × 5
#>    document topic_1 topic_2 topic_3 topic_4
#>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1        1  0.0278  0.139   0.25    0.583 
#>  2        2  0.0278  0.472   0.0278  0.472 
#>  3        3  0.875   0.0417  0.0417  0.0417
#>  4        4  0.75    0.0357  0.179   0.0357
#>  5        5  0.708   0.0417  0.0417  0.208 
#>  6        6  0.125   0.125   0.125   0.625 
#>  7        7  0.0625  0.562   0.0625  0.312 
#>  8        8  0.312   0.312   0.312   0.0625
#>  9        9  0.0417  0.708   0.208   0.0417
#> 10       10  0.281   0.531   0.0312  0.156 
#> # ℹ 90 more rows
explore %>% pluck("exemplars", 4)
#> # A tibble: 206 × 5
#>    document topic gamma message                                          url_var
#>    <chr>    <int> <dbl> <chr>                                            <chr>  
#>  1 54           1 0.906 In 2013 U.S. Rep. Robert “Beto” O’Rourke ruled … https:…
#>  2 3            1 0.875 Pura Belpré Award Winners 1996 - 2018 | Colours… https:…
#>  3 56           1 0.825 October Hispanic Heritage Month. At Pompano Bea… https:…
#>  4 22           1 0.812 Beautiful words #DreamSpeaker Breanna Zwart.⠀ .… https:…
#>  5 63           1 0.812 Today’s Hispanic Heritage Luncheon First Bank F… https:…
#>  6 4            1 0.75  Awesome music Arco Iris, first OUT Mariachi ban… https:…
#>  7 5            1 0.708 @Valley_Vikings Celebrating Hispanic Heritage M… https:…
#>  8 59           1 0.65  Stop library see amazing Hispanic Heritage post… https:…
#>  9 64           1 0.65  My little Frida Kahlo!! Representing love art H… https:…
#> 10 96           1 0.65  Don't forget join us @CGLA_Chatt #hispanicherit… https:…
#> # ℹ 196 more rows
explore %>% pluck("top_terms", 4, "max_only")