seetrees

R package that enhances interpretability of stylometric results from stylo

The package seetrees is a small (currently) extension to a well-known stylo library. The latter does unsupervised and supervised classification of texts based on bags of features, and relies a lot on hierarchical clustering. I add some tree cutting and feature-to-cluster association measures, so that one can detect a high-level corpus bias, and get insight into which words might drive the clusterization.

I am not planning releasing the package on CRAN any time soon, it is mostly used for demonstration, teaching and exploration of the corpus. You can, however, install it from Github repository. You can find a demonstration below.

Clusters & features

Install from GitHub (make sure you have devtools package):

devtools::install_github("perechen/seetrees")
library(stylo)
library(seetrees)

data(lee) ## load one of the stylo datasets

stylo_res <- stylo(frequencies=lee,gui=F)

view_tree(stylo_res, k=2,right_margin=12) ## redraws a dendrogram based on distance matrix, cuts it to k groups, shows associated features 

Check ?view_tree() for more details.