✨ Awesome FollowTheMoney ✨
Collection of tools and projects in the FollowTheMoney ecosystem
Introduction
Have a look at the introduction from the official docs if you are not familiar with FollowTheMoney as a data model, its concepts and the base python implementation. As well find below a list of introductory material.
Over the past years, a lot of projects have been built on top of FollowTheMoney by a wide range of individual contributors and organizations. Many libraries are part of the ecosystem that can be used standalone or in application contexts to deal with all kinds of subjects regarding FollowTheMoney data, such as scraping, transforming or cleaning Entites, store them in various database backends, search and query data, and built complete full-stack applications for it.
If you want your project on this list, just do a pull request to this repository: alephdata/awesome-ftm
Introductory material
- FollowTheMoney documentation
- Intro workshop (2024)
- Getting started with projects using the FollowTheMoney toolkit
Full stack applications
These applications are probably the reason why you ended up here. Most of the smaller packages below are part of their full stack.
- Aleph – Original open-source core project, will no longer be maintained after October 2025
- OpenAleph – Search through large documents and structured data
- Aleph Pro – Closed-source SaaS version of original Aleph project, launching October 2025
Build data and datasets
Tools and frameworks for creating FollowTheMoney data with scrapers or custom applications.
- followthemoney – core ontology and data validation system, includes CSV/SQL to FtM mapper.
- memorious – light-weight web scraping toolkit for scrapers that collect structured or un-structured data
- A more recent fork of memorious
- zavod – Data processing framework as part of OpenSanctions
- investigraph – Framework to create FollowTheMoney data
- ingest-file – Create document graphs out of source data for Aleph applications
Specialised data importers: - followthemoney-ocds - Convert open contracting data standard files to FtM - followthemoney-cellebrite - Import data forensics dumps from Cellebrite - Importers for BODS (Beneficial Ownership Data) and GLEIF RR files are in OpenSanctions.
Clean data
Tools and frameworks for cleaning and validating FollowTheMoney data.
- rigour – Data cleaning and validation functions for processing various types of text emanating and describing the business world, base to
followthemoney
. - countrynames – This library helps with the mapping of country names to their respective two or three letter codes
- prefixdate – a helper class to parse dates with varied degrees of precision
- datapatch – A Python library for defining rule-based overrides on messy data
- normality – a Python micro-package that contains a small set of text normalization functions for easier re-use
- countrytagger – extract country name references from text
- followthemoney-typepredict - guess the FtM type class of a piece of text, including distinguishing company and person names.
Analyze data
Tools and frameworks for analyzing FollowTheMoney data, for example transcribing Audio and Video entities, detecting languages or Named Entity Extraction (NER).
- ftm-analyze – The standalone ftm analyzer formerly included in
ingest-file
for all kinds of processing - ftm-geocode – Batch parse and geocode addresses from FollowTheMoney entities
- ftm-transcribe – Extract text from Video and Audio
- followthemoney-compare – pre-process and train models to power a cross-reference system for FollowTheMoney data, includes a model based on regression and word frequency analysis in names.
- juditha – Compare and resolve NER results to actual known FtM Entities
- ingest-file.analysis – Part of the document ingestion is a comprehensive analysis phase used for Aleph applications
Store entity data
Tools and applications for storing and retrieving FollowTheMoney data such as databases, key-value stores or document archives. Contains as well tools for storing related data (such as images for Entities).
- followthemoney-store – Sql-backed store for Entity fragments
- nomenklatura – Store entity data as statements.
- Implementations for different graph-traversable backends (memory, redis, kvrocks, sql).
- Various entity matching algorithms (rule- and regression-based), and an in-memory cross-referencing index for data deduplication.
- A Wikidata client with mappings from their data model onto FtM statements (wants to become
followthemoney-wikidata
at some point) - Data enrichment clients for building out investigative graphs pulling in remote info from Aleph, yente, Wikidata, OpenCorporates, PermID, OpenFIGI.
- ftmq – More advanced querying logic on top off the
nomenklatura
store implementations - bahamut – WIP FollowTheMoney statement data server with built-in entity resolution support. Written in Java.
- FollowTheMoney Data Lake – Scalable storage for structured data and document archives (upcoming)
- ftm-columnstore – Clickhouse-backed implementation of a
nomenklatura
statement store - servicelayer – Document archive for legacy Aleph and OpenAleph
- leakrfc – data standard and archive storage for leaked data, private and public document collections, will become
ftm-datalake
(see above) - ftm-assets – Assets (image) resolver and storage for FollowTheMoney data
IO / Streaming
Tools and helpers for streaming FollowTheMoney data between stores and systems.
- alephclient – Getting data in and out of Aleph with its API
- openaleph-client –
alephclient
fork for OpenAleph, adds more pre-processing capabilities. - ftmq.io – Generic helpers for read and write FollowTheMoney data from and to various local and remote locations
API / Search
Building blocks for serving and searching FollowTheMoney datasets for web applications.
- yente – API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API specification.
- ftmq-api – Expose statement stores (by
ftmq
/nomenklatura
) to a read-only FastAPI - ftmq-search – Search experiments for FollowTheMoney data with different backends (Sqlite FTS, tantivy, elasticsearch)
Projects / Use cases
Data exploration projects that make use off the FollowTheMoney stack described above.
- OCCRP Aleph – The global archive of research material for investigative reporting
- OpenSanctions – OpenSanctions helps investigators find leads, allows companies to manage risk and enables technologists to build data-driven products
- OpenSecurityData.eu – Find companies, organizations or projects that receive European Union security funding
- Farmsubsidy.org – Collecting and processing detailed data relating to payments and recipients of farm subsidies in every EU member state
- FollowTheGrant – Data and investigations about potential conflicts of interest within academic research
- EveryPolitician – Political exposed persons (PEPs), re-launching H2 2025.
- CORRECTIV Court Donations – Who receives court donations in germany?
- YouControl.World – KYB commercial platform based on Aleph
- DPRK Reports – Graph-building data project working on North Korean sanctions evasion
- reveng.ee – Activist portal from Ukraine, lots of searchable Russian data.
- DDoS Library of Leaks – Public searchable leaks
Data libraries / catalogs
Many FollowTheMoney Entities form a Dataset, many datasets form a Catalog (some prefer to call it Library).
Learn more: Dataset / Catalog metadata
Discontinued / legacy tools
These libraries have been discontinued or merged with others:
- Aleph Data Desktop – desktop application for drawing investigative network diagrams.
- pantomime – parsing and normalisation of internet MIME types in Python (discontinued, now in
rigour.mime
) - fingerprints – Name handling utilities for person and organisation names (discontinued, now in
rigour.names
) - languagecodes – normalise the ISO 639 codes used to describe languages from two-letter codes to three letters, and vice versa (discontinued, now in
rigour.langs
) - addressformatting – address formatter that can format addresses in multiple formats that are common in different countries (discontinued, now in
rigour.addresses
) - followthemoney-predict - previous entity comparison/linkage codebase.