In this work we introduce CUTS (Contrastive and Unsupervised Training for Segmentation), the first fully unsupervised deep learning framework for medical image segmentation to better utilize the vast majority of imaging data that is not labeled or annotated. Segmenting medical images is a critical task for facilitating both patient diagnoses and quantitative research. A major limiting factor is the lack of labeled data, as obtaining expert annotations for each new set of imaging data or task can be expensive, labor intensive, and inconsistent across annotators. Thus, we utilize self-supervision from pixels and their local neighborhoods in the images themselves. Our unsupervised approach optimizes a training objective that leverages concepts from contrastive learning and autoencoding. In contrast to prior work, our framework segments medical images with a novel two-stage approach without relying on any labeled data at any stage. The first stage involves the creation of a “pixel-centered patch” that embeds every pixel along with its surrounding patch, using a vector representation in a high-dimensional latent embedding space. The second stage utilizes diffusion condensation, a multi-scale topological data analysis approach, to dynamically coarse-grain these embedding vectors at all levels of granularity. The final outcome is a series of coarse-to-fine segmentations that highlight image structures at various scales. We show successful multi-scale segmentation on natural images, retinal fundus images, and brain MRI images. Our framework delineates structures and patterns at different scales which may carry distinct information relevant to clinical interpretation. As we tackle the problem of segmenting medical images at multiple meaningful granularities without relying on any label, we demonstrate the possibility to circumvent tedious and repetitive manual annotations in future practice.