Skip to content

✨ Awesome FollowTheMoney ✨

Collection of tools and projects in the FollowTheMoney ecosystem

Introduction

Have a look at the introduction from the official docs if you are not familiar with FollowTheMoney as a data model, its concepts and the base python implementation. As well find below a list of introductory material.

Over the past years, a lot of projects have been built on top of FollowTheMoney by a wide range of individual contributors and organizations. Many libraries are part of the ecosystem that can be used standalone or in application contexts to deal with all kinds of subjects regarding FollowTheMoney data, such as scraping, transforming or cleaning Entites, store them in various database backends, search and query data, and built complete full-stack applications for it.

If you want your project on this list, just do a pull request to this repository: alephdata/awesome-ftm

Introductory material

Full stack applications

These applications are probably the reason why you ended up here. Most of the smaller packages below are part of their full stack.

  • Aleph – Original open-source core project, will no longer be maintained after October 2025
  • OpenAleph – Search through large documents and structured data
  • Aleph Pro – Closed-source SaaS version of original Aleph project, launching October 2025

Build data and datasets

Tools and frameworks for creating FollowTheMoney data with scrapers or custom applications.

  • followthemoney – core ontology and data validation system, includes CSV/SQL to FtM mapper.
  • memorious – light-weight web scraping toolkit for scrapers that collect structured or un-structured data
  • zavod – Data processing framework as part of OpenSanctions
  • investigraph – Framework to create FollowTheMoney data
  • ingest-file – Create document graphs out of source data for Aleph applications

Specialised data importers: - followthemoney-ocds - Convert open contracting data standard files to FtM - followthemoney-cellebrite - Import data forensics dumps from Cellebrite - Importers for BODS (Beneficial Ownership Data) and GLEIF RR files are in OpenSanctions.

Clean data

Tools and frameworks for cleaning and validating FollowTheMoney data.

  • rigour – Data cleaning and validation functions for processing various types of text emanating and describing the business world, base to followthemoney.
  • countrynames – This library helps with the mapping of country names to their respective two or three letter codes
  • prefixdate – a helper class to parse dates with varied degrees of precision
  • datapatch – A Python library for defining rule-based overrides on messy data
  • normality – a Python micro-package that contains a small set of text normalization functions for easier re-use
  • countrytagger – extract country name references from text
  • followthemoney-typepredict - guess the FtM type class of a piece of text, including distinguishing company and person names.

Analyze data

Tools and frameworks for analyzing FollowTheMoney data, for example transcribing Audio and Video entities, detecting languages or Named Entity Extraction (NER).

  • ftm-analyze – The standalone ftm analyzer formerly included in ingest-file for all kinds of processing
  • ftm-geocode – Batch parse and geocode addresses from FollowTheMoney entities
  • ftm-transcribe – Extract text from Video and Audio
  • followthemoney-compare – pre-process and train models to power a cross-reference system for FollowTheMoney data, includes a model based on regression and word frequency analysis in names.
  • juditha – Compare and resolve NER results to actual known FtM Entities
  • ingest-file.analysis – Part of the document ingestion is a comprehensive analysis phase used for Aleph applications

Store entity data

Tools and applications for storing and retrieving FollowTheMoney data such as databases, key-value stores or document archives. Contains as well tools for storing related data (such as images for Entities).

  • followthemoney-store – Sql-backed store for Entity fragments
  • nomenklatura – Store entity data as statements.
    • Implementations for different graph-traversable backends (memory, redis, kvrocks, sql).
    • Various entity matching algorithms (rule- and regression-based), and an in-memory cross-referencing index for data deduplication.
    • A Wikidata client with mappings from their data model onto FtM statements (wants to become followthemoney-wikidata at some point)
    • Data enrichment clients for building out investigative graphs pulling in remote info from Aleph, yente, Wikidata, OpenCorporates, PermID, OpenFIGI.
  • ftmq – More advanced querying logic on top off the nomenklatura store implementations
  • bahamut – WIP FollowTheMoney statement data server with built-in entity resolution support. Written in Java.
  • FollowTheMoney Data Lake – Scalable storage for structured data and document archives (upcoming)
  • ftm-columnstoreClickhouse-backed implementation of a nomenklatura statement store
  • servicelayer – Document archive for legacy Aleph and OpenAleph
  • leakrfc – data standard and archive storage for leaked data, private and public document collections, will become ftm-datalake (see above)
  • ftm-assets – Assets (image) resolver and storage for FollowTheMoney data

IO / Streaming

Tools and helpers for streaming FollowTheMoney data between stores and systems.

  • alephclient – Getting data in and out of Aleph with its API
  • openaleph-clientalephclient fork for OpenAleph, adds more pre-processing capabilities.
  • ftmq.io – Generic helpers for read and write FollowTheMoney data from and to various local and remote locations

Building blocks for serving and searching FollowTheMoney datasets for web applications.

  • yente – API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API specification.
  • ftmq-api – Expose statement stores (by ftmq / nomenklatura) to a read-only FastAPI
  • ftmq-search – Search experiments for FollowTheMoney data with different backends (Sqlite FTS, tantivy, elasticsearch)

Projects / Use cases

Data exploration projects that make use off the FollowTheMoney stack described above.

  • OCCRP Aleph – The global archive of research material for investigative reporting
  • OpenSanctions – OpenSanctions helps investigators find leads, allows companies to manage risk and enables technologists to build data-driven products
  • OpenSecurityData.eu – Find companies, organizations or projects that receive European Union security funding
  • Farmsubsidy.org – Collecting and processing detailed data relating to payments and recipients of farm subsidies in every EU member state
  • FollowTheGrant – Data and investigations about potential conflicts of interest within academic research
  • EveryPolitician – Political exposed persons (PEPs), re-launching H2 2025.
  • CORRECTIV Court Donations – Who receives court donations in germany?
  • YouControl.World – KYB commercial platform based on Aleph
  • DPRK Reports – Graph-building data project working on North Korean sanctions evasion
  • reveng.ee – Activist portal from Ukraine, lots of searchable Russian data.
  • DDoS Library of Leaks – Public searchable leaks

Data libraries / catalogs

Many FollowTheMoney Entities form a Dataset, many datasets form a Catalog (some prefer to call it Library).

Learn more: Dataset / Catalog metadata

Discontinued / legacy tools

These libraries have been discontinued or merged with others:

  • Aleph Data Desktop – desktop application for drawing investigative network diagrams.
  • pantomime – parsing and normalisation of internet MIME types in Python (discontinued, now in rigour.mime)
  • fingerprints – Name handling utilities for person and organisation names (discontinued, now in rigour.names)
  • languagecodes – normalise the ISO 639 codes used to describe languages from two-letter codes to three letters, and vice versa (discontinued, now in rigour.langs)
  • addressformatting – address formatter that can format addresses in multiple formats that are common in different countries (discontinued, now in rigour.addresses)
  • followthemoney-predict - previous entity comparison/linkage codebase.