Skip to contents

For choosing topic_cutoff, it's important to understand there is simply not a one-size fits all solution. The topic probabilities follow probability distribution axioms but are not best understood as exact probabilities.

Usage

topics_classify(linked_df, topic_cutoff = 0.5)

Arguments

linked_df

Data Frame which has been linked to probabilities through topics_link

topic_cutoff

Your cut-off probability, 0.5 is the default value.

Value

Data Frame pivoted into longer format ready for plotting

Details

For example, if k = 8, 1/k = 0.125, a score of 0.250 is x2 the mean probability if distributed randomly. However, if k = 4, 1/k = 0.250 which is equal to the mean probability if distributed randomly, so all 0.250s are not equal.

A degree of domain knowledge and data set exploration is required to settle on the correct cut-off points.

Examples

list_data <- SegmentR:::test_data()
#> removing stopwords
#> Making DTMs
#> making tuning grid
#> setting up LDAs
probabilities <- list_data$explore$probabilities[[1]]
data <- list_data$lda$data[[1]]

linked <- topics_link(data, probabilities)

classified <- topics_classify(linked, 0.75)