Clustering algorithms that directly interrogate high-dimensional data, such as FlowSOM 11 and PhenoGraph 12, are often used in conjunction with 2D maps to present annotated cell clusters to the viewer.Ī limitation of t-SNE in its current form is its inability to scale to datasets with large numbers of observations 7, 8. Visualizations of cytometry data produced with other non-linear embedding algorithms, such as LargeVis 8, UMAP 9, and EmbedSOM 10, can be interpreted and interrogated in a similar manner. In addition, t-SNE maps are used to categorize single cell data into relevant biological populations for downstream quantification, achievable through expert-guided filtering (gating) 4 or unsupervised clustering of the map 5, 6, 7. When t-SNE embeds single cell data, the islands represent cells with similar phenotypes, as defined by a cytometric or genomic signature, thereby allowing to reveal biological data structure and to surface important differences between samples and/or subject groups 3. Conspicuous groupings of datapoints, or ‘islands’, correspond to observations that are similar in the original high-dimensional space and help to visualize the general structure and heterogeneity of a dataset. t-Distributed Stochastic Neighbor Embedding (t-SNE) is a state-of-the-art dimensionality reduction algorithm for non-linear data representation that creates a low-dimensional distribution, or a ‘map’, of high-dimensional data 1, 2. Linear methods, such as PCA, are mostly unsuitable for cytometry data visualization as such techniques cannot faithfully present the non-linear relationships. To date, multiple dimensionality reduction techniques have been applied to cytometry data with variable success. While traditional biaxial data presentation via expert-driven gating is still the standard analysis method for cytometry data, with the advent of the modern multi-parameter era an analysis tool that can accurately and comprehensively visualize multi-dimensional data is direly needed to relieve the current cytometry data-processing bottleneck. Fluorescence, mass and sequencing-based cytometric data analysis requires tools that are able to reveal the combinations of proteomic and/or transcriptomic markers that define complex and diverse cell phenotypes in a mixed population. Visual exploration of high-dimensional data is imperative for the comprehensive analysis of single cell datasets. In summary, opt-SNE enables superior data resolution in t-SNE space and thereby more accurate data interpretation. The precise calibration of early exaggeration together with opt-SNE adjustment of gradient descent learning rate dramatically improves computation time and enables high-quality visualization of large cytometry and transcriptomics datasets, overcoming limitations of analysis tools with hard-coded parameters that often produce poorly resolved or misleading maps of fluorescent and mass cytometry data. We develop opt-SNE, an automated toolkit for t-SNE parameter selection that utilizes Kullback-Leibler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear representations of datasets when millions of cells are projected. Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations.