ML and AI Patents

Overview

My work in machine learning has focused on building dependable, interpretable models for challenging, low-resource environments. Much of this research was motivated by the need to extract signal from noise in settings where data is messy, sparse, and limited in quantity. Data is often very expensive in industry, and collecting more data, or curating a cleaner dataset, is simply not practical for most projects. I needed a technology which worked with the data I had. Rather than relying on brittle black-box techniques, I developed statistically principled models informed by domain knowledge and tailored to the structure of the data.

At the heart of this effort is the Hierarchical Pitman-Yor Process (HPYP), a nonparametric Bayesian prior that captures the power-law and hierarchical structure characteristic of natural language, biological sequences, and other real-world data. This process became the foundation for a series of patents, ranging from foundational modeling techniques to practical applications in classification and semantic search.

While the patents listed below reflect research conducted during my time at a previous employer, they laid the groundwork for many of the core ideas behind Sturdy Statistics. Our current work builds on these insights using newer, more expressive, and more useful models, along with more scalable inference techniques. Sturdy Statistics improved on these ideas and also makes them practical for a wide range of users — from independent developers and small companies, all the way up to enterprises with large data science teams. However, my central focus is always on robust, explainable AI that works in the real world — even when data is limited, noisy, and domain-specific.

Hierarchical Pitman-Yor Process as a Language Model

Like most recent NLP research, my innovation centers on an improved language model. Rather than using a Transformer or some other neural-network architecture, however, I chose to base my technology on the Hierarchical Pitman-Yor Process. This produces a cleanly structured language model, with analytic properties that capture Zipfian distributions and hierarchical dependencies across multiple levels of abstraction. This approach to representation learning provides the automated feature discovery of Deep Learning, while also providing a principled statistical foundation for modeling natural language, user behavior, and genomic sequences with sparse, long-tailed statistics.

US11115520B2 describes the base model, introducing a deep, self-conjugate HPYP hierarchy for document modeling. This allows for composable models that reflect both global and local topic structure while preserving power-law behavior throughout the hierarchy.
US11429901B1 extends the model by introducing a mixture prior that allows domain knowledge — in this case, predefined keyword clusters — to be embedded directly into the probabilistic framework. This speeds up convergence, improves interpretability, and enhances performance on small, noisy datasets.
US11521601B2 and US11804216B2 describe direct applications of these models, including systems for distinguishing spontaneous, natural conversation from recordings or automated agents.

These innovations enable models that are naturally interpretable, data-efficient, and well-suited to domains where training data is limited or costly to obtain.

Using HPYP-Derived Representations for Downstream Tasks

One of the key advantages of HPYP-based models is that they produce sparse, specific, and resilient latent representations — ideal for downstream tasks such as classification, search, and clustering. Unlike dense embeddings from neural networks, these structured representations are robust to noise and easier to interpret.

US12230253B2 applies these representations to phone call classification, using a joint topic-and-label model to identify the purpose of a conversation with high accuracy. Because the topic model and classifier are trained jointly, the resulting predictions remain interpretable and adaptable to new domains.
US20240312451A1 (pending) builds on this by applying HPYP-derived features to semantic search, using Bayesian belief networks and domain-aware priors to index and retrieve documents based on inferred topic structure. This approach supports contextual retrieval even in low-data settings, while retaining transparency and explainability.

These downstream systems demonstrate the versatility of structured probabilistic models: they can serve not only as generative tools, but like deep learning, they can also serve as feature extractors for supervised learning pipelines.

Techniques for Small, Noisy Datasets

Much of this work was designed with low-resource environments in mind — settings where data is scarce, noisy, or expensive to label. In these scenarios, models must be resilient to overfitting and robust to label errors.

US10719783B2 presents a technique for error-tolerant classification, using iterative refinement to identify mislabeled examples and improve model accuracy with minimal additional supervision.
Several patents (including US11115520B2 and US11429901B1) emphasize sparse priors, which separate signal from noise and promote interpretable structure even in small datasets.
The broader modeling framework emphasizes Bayesian model averaging, helping to isolate meaningful signals from noisy data while quantifying uncertainty — a crucial feature for sensitive applications.

These techniques collectively support a vision of AI that is both statistically grounded and practically reliable, even when working with limited data.

Looking Ahead

While the patents listed here describe earlier research, many of the ideas they contain — sparse latent structure, mixture priors, probabilistic semantics, and domain-aware modeling — remain central to my current work.

At Sturdy Statistics, we’ve developed a new class of generative models that go beyond these initial frameworks. Our systems retain the core goals of transparency, adaptability, and statistical rigor, while incorporating newer inference techniques, more expressive priors, and a focus on structured retrieval for downstream tasks, including integration with large language models (LLMs) such as ChatGPT. The models are optimized for small datasets and support scientific use cases in genomics, healthcare, and customer research in addition to industry applications.

The inventions detailed in these patents set me on the path toward Sturdy Statistics — but the journey continues.

Individual Patents

The following patents reflect the core innovations described above.

Michael K. McCourt Jr., Automatic classification of phone calls using representation learning based on the hierarchical pitman-yor process. US12230253B2 Feb 18 2025 [PDF]
Patent US12230253B2 presents a novel method for supervised classification of phone calls using a probabilistic admixture model grounded in the Hierarchical Pitman-Yor Process (HPYP). The innovation lies in adapting HPYP — a nonparametric Bayesian prior that generalizes the Dirichlet Process — to learn sparse, structured latent representations of spoken conversations. This approach enables flexible discovery of hierarchical topic distributions and naturally captures long-tail and power-law behaviors in conversational data. The model is trained end-to-end alongside a classifier, allowing it to generalize to new call transcripts without retraining. This technique not only improves the granularity and accuracy of classifications and inferred topics but also facilitates efficient downstream indexing and retrieval in systems where interpretability and scalability are critical.
Michael K. McCourt Jr., Kian Ghodoussi, and Victor Borda, Topic-based semantic search of electronic documents based on machine learning models from bayesian belief networks. US20240312451A1 (pending) Mar 17 2023 [PDF]
Patent US20240312451A1 introduces an innovative approach to semantic search of electronic documents by integrating domain knowledge into machine learning models using a mixture prior. This method employs Bayesian belief networks to represent the sparse probabilistic relationships among topics within a document corpus. The key advancement is the incorporation of a mixture prior that blends data-driven insights with predefined domain expertise. This integration allows the model to be initialized with domain-relevant information, facilitating faster convergence during training and enhancing generalization capabilities, especially in scenarios with limited data. By embedding domain knowledge directly into the prior distributions of the Bayesian framework, the system effectively narrows the hypothesis space, leading to more accurate and contextually relevant topic modeling and semantic search outcomes.
Michael K. McCourt Jr. and Victor Borda, Pitman-Yor process topic modeling pre-seeded by keyword groupings. US11429901B1 Aug 30 2022 [PDF]
Patent US11429901B1 introduces a method for document decomposition that integrates domain-specific knowledge via a mixture-model prior. This approach allows the model to incorporate predefined keyword clusters as priors, effectively embedding domain expertise into the probabilistic framework. By doing so, the model achieves faster convergence during training and enhances generalization capabilities, particularly in data-scarce scenarios. This integration of domain knowledge into the modeling framework offers improved efficiency and accuracy in applications requiring nuanced understanding of specialized content.
Michael K. McCourt Jr. and Anoop Praturu, Generating training datasets for a supervised learning topic model from outputs of a discovery topic model. US11804216B2 Oct 31 2023 [PDF]
Michael K. McCourt Jr. and Michael Lawrence, Detecting extraneous topic information using artificial intelligence models. US11521601B2 Dec 6 2022 [PDF]
Michael K McCourt Jr., Sean Storlie, Victor Borda, Michael Lawrence, and Anoop Praturu, Signal discovery using artificial intelligence models. US11115520B2 Sep 7 2021 [PDF]
Patent US11115520B2 introduces a method for signal discovery in electronic communications by employing a probabilistic topic model based on the Hierarchical Pitman-Yor Process (HPYP). The HPYP is utilized to capture the hierarchical and power-law characteristics inherent in natural language. This approach enables the model to identify and extract meaningful patterns and topics from communication data, facilitating applications such as call classification and analysis. By leveraging the HPYP, the method effectively models the complex structure of language, leading to improved accuracy in signal discovery tasks. The patent describes a means to build deeply hierarchical models, producing a Bayesian form of deep learning, in which the layers naturally encode Zipfian characteristics appropriate for natural language and for genomics.
Michael K. McCourt Jr., Binary signal classifiers that tolerate incorrect training data. US10719783B2 Jul 21 2020 [PDF]
Patent US10719783B2 introduces a method to enhance the resilience of binary signal classifiers against label errors in training datasets, particularly when dealing with limited data. The innovation involves an iterative approach where the classifier is trained on the dataset, and then the training entries are re-evaluated based on the classifier's predictions. Entries that the classifier consistently misclassifies are identified as potentially mislabeled. These identified entries are then either corrected or removed, and the classifier is retrained on the refined dataset. This process is repeated until the classifier's performance stabilizes, effectively reducing the impact of incorrect labels and improving generalization. By systematically identifying and addressing label errors, this method enhances the robustness and accuracy of classifiers trained on small, noisy datasets.
Michael K. McCourt Jr., Performance score determiner for binary signal classifiers. US11423330B2 Aug 23 2022 [PDF]
Sean Storlie, Victor Borda, Michael K. McCourt Jr, Leland Kirchhoff, Colin Kelley, and Nicholas Burwell, Desired signal spotting in noisy, flawed environments. US10332546B1 Jun 25 2019 [PDF]