--- title: "Documenting evaluation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Documenting evaluation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The other vignette focuses on reproducing a single clustering workflow that assumes that the number of clusters has been decided. As the app includes a few options for evaluating clusters, some of the functions are also made available in the package. The output of the clustering functions can also be used with other packages. ```{r setup, message=FALSE, warning=FALSE} library(visxhclust) library(dplyr) ``` ### Preprocessing and clustering ```{r prep} numeric_data <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Width) dmat <- compute_dmat(numeric_data, "euclidean", TRUE) clusters <- compute_clusters(dmat, "complete") ``` ### Gap statistic For Gap statistic, the optimal number of clusters depends on the method use to compare cluster solutions. The package cluster includes the function `cluster::maxSE()` to help with that. ```{r gapstat} gap_results <- compute_gapstat(scale(numeric_data), clusters) optimal_k <- cluster::maxSE(gap_results$gap, gap_results$SE.sim) line_plot(gap_results, "k", "gap", xintercept = optimal_k) ``` ### Other measures The Shiny app also includes the option to compute average silhouette widths or Dunn index. The function `compute_metric` works similarly to `compute_gapstat`, whereas `optimal_score` is similar to maxSE. However, `optimal_score` varies only between first and global minimum and maximum. ```{r dunn} res <- compute_metric(dmat, clusters, "dunn") optimal_k <- optimal_score(res$score) line_plot(res, "k", "score", optimal_k) ```