What is SegmentR?
SegmentR is the collective name for the techniques SHARE uses to find latent groups in data. It has two main aspects:
-
Topic modelling
- Finding latent groups in a corpus of text using Latent Dirichlet Allocation.
- E.g. What the topics being discussed in these tweets?
-
Cluster analysis
- Finding latent groups in any data set using distance metrics.
- E.g. What are the different types of customer present in our database?
So far, this R package only contains functions to help with the topic modelling workflow.