This article was written under the guidance of Michael,
our Senior DevOps Engineer with 10+ years of expertise in this field.
We continue our series of articles on what sets FinTech and Gambling projects apart from other areas of business where IT plays a decisive role. If you have not yet read the BeFund blog posts “Where Do Your Secrets Live? Security Architecture in Gambling and Fintech“ and “Data Monitoring in FinTech and Gambling: A Question of Survival, Not Comfort“, we recommend doing so first — many points in this article build on what was covered there. Now that we know how to secure access to data and how to intelligently monitor its flows, it is time to figure out how to ensure data integrity.
Data in FinTech and Gambling projects is not just a set of zeros and ones — it is your money. Every single bit of everything collected and stored. So let’s agree going forward: whenever we write “data,” picture money. And vice versa: money is data, and it is the foundation of financial security, user trust, business stability, and regulatory compliance. Keeping this in mind will help us explore this topic in the most meaningful way for the reader.
Data Integrity — A Critical Requirement
Take any standard web project: losing some data there is nothing more than an inconvenient bug. In FinTech and Gambling, however, that same loss can become a financial incident. Picture this: a user topped up their balance, but the funds never credited; a withdrawal was executed, but the status stayed pending; a bet was accepted, but the balance did not change; a bonus was credited twice; after a failover, some transactions vanished — and so on. What client or player is going to be happy about any of that? Sure, you could say: we’re all only human, things happen, try to understand… But in the real world, nobody cares — everyone is only worried about their own money, not your problems. So your business cannot afford to have problems.
Core Operations
FinTech and Gambling projects work with operations where every mistake carries financial consequences. This becomes clear when you think through the key operations and what they actually mean:
- Deposits — the act of receiving assets that a user transfers to your project for safekeeping and further interaction;
- Withdrawals — the act of a user pulling available funds out of their account within the project;
- Bets — placing a wager with the user, where they stake deposited assets on the predicted outcome of a game, trade, etc.;
- Winnings — receiving assets, property, services, points, or other tangible and intangible value as a result of success, luck, or victory in games;
- Settlement — in casino and betting contexts, this is the process of finally processing and recording the outcome of a bet or game, after which the winnings are credited to the player’s balance. While an event is ongoing, the bet carries an “open” status; once settlement runs, it is officially closed;
- Refund — a voluntary return of funds from a player’s gaming balance back to their bank card or e-wallet, initiated by the project itself. Do not confuse this with cashback or chargeback — those are entirely different financial instruments;
- Bonuses — incentives (typically in the form of extra funds or free game actions) that the project grants to clients for meeting certain conditions or hitting agreed-upon milestones;
- Commissions — fees charged for services (percentages of transactions) and trading or intermediary arrangements (commission agreements);
- KYC/AML statuses — the results of your project’s compliance checks against international standards for a safe financial institution and anti-money-laundering requirements;
- Payment callbacks — automated HTTP notifications that a payment gateway sends to your solution’s server to report a status change for a transaction (for example, a successful payment, a declined card, or a refund). These are how your project “finds out” that a client has paid, even if they closed the payment page;
- User balance check — retrieving up-to-date data on the state of your clients’ deposits;
- Ledger transactions — (when using cryptocurrencies) the process of signing and sending crypto transfers via a hardware wallet;
- Affiliate / partner commissions — the reward (a percentage of a deposit or a flat fee) that the project pays to a partner for bringing in a new client. This is the backbone of affiliate marketing: when your client recommends your product, service, or platform and a new deposit follows from their referral.
As you can see, every one of these core operations involves assets. That makes each of them a subject of heightened scrutiny — because every mistake can cost you either your own funds (if transactions go against you) or your reputation (if, on the other hand, users receive assets they were never entitled to).
Typical Problems
Although every project (not just FinTech and Gambling) is unique and multi-layered, over years of work the BeFund team has tracked and isolated a number of recurring mistakes that stem from architectural shortcomings. Fixing them after the fact is resource-intensive, whereas getting the architecture right from the start minimizes the damage. Common problems include:
- Balance error: a user topped up their account, but the deposit was never credited;
- Deposit duplication — effectively a double crediting of assets, though this is a widespread issue in IT systems;
- A withdrawal was executed, but the status stayed pending;
- A withdrawal was created a second time — analogous to the double deposit crediting;
- A bet was accepted, but the balance did not change;
- Settlement ran, but the winnings were never credited;
- A bonus was deducted, but the transaction history was not created properly;
- Some transactions disappeared after a failover;
- A webhook from a provider arrived twice and updated the balance a second time;
- The replica fell behind and the user sees a stale balance — we will go into detail on this shortly.
Core Principles
There are indeed many problems and threats. There are also quite a few ways and methods to counter them, so the first step is to understand the core principles that should guide the pursuit of data integrity. From our own experience, we would like to share three “Nevers” and one “Always” that help you avoid even the worst problems in FinTech and Gambling projects:
❌ Never lose money (data);
❌ Never create duplicate data (double-entries);
❌ Never corrupt a user’s balance;
✅ Always ensure the ability to restore system state without financial discrepancies.
Achieving this is possible by implementing replication. More on that below.
Primary, Replica, Slave: Why Replication Matters
Most people are familiar with “backups” — creating copies of data for recovery in the event of some mishap. We do it on our smartphones, let alone in IT projects. However, a less well-known process is replication, which serves a similar purpose but fulfills a different role. Replication is the process of creating and automatically maintaining exact copies of data or systems across multiple different servers in real time. It guarantees that information stays safe, remains available to users, and is not lost in the event of failures. It is one of the foundational mechanisms for scaling, fault tolerance, and database redundancy — not just a “just-in-case” copy. Here is a clear illustration:
In brief, the ideal model for a coherent database looks like this:
Writes → Primary;
Reads → Replica.
The main benefits of using a replica include:
- Reduced load on the primary database;
- Scaling of read queries;
- Report generation;
- Analytics;
- Backoffice — the ability to support internal operations without touching the primary data;
- CRM integration;
- Fallback reads;
- Disaster recovery — a set of strategies, policies, and technologies for rapidly restoring IT infrastructure and preserving data after a serious outage or force majeure event;
- Load distribution (failover);
- And, of course, backup.
Despite these advantages, replication is not a silver bullet for every situation. From our experience, we must point out that for FinTech and Gambling projects there are exceptions where read queries from replicated data are strictly off-limits. For instance, a user’s balance after a deposit, the status of a withdrawal, a bet decision, and a number of other indicators are better read from the primary or from another strongly consistent source. The core reason: because a replica is not the source of truth, it can lag behind the real state of events. In that case, a read query will return an incorrect response — and that is simply not acceptable.
Replication Lag: The Hidden Danger
Replication sync delays can genuinely fray the nerves of both owners and developers, so it is worth examining this issue in more detail. The causes of lag can be numerous, so it is important to identify them accurately, monitor them, and fix them. Let’s walk through some examples.
A Moderate Replication Lag Example
- A user makes a deposit;
- The Primary DB writes the transaction;
- The Replica has not yet received this entry;
- The application reads the balance from the replica;
- The user sees a stale balance;
- Support receives a complaint, even though technically the record already exists.
The situation is unpleasant but not yet critical: by the time support gets around to reviewing the complaint, the Replica may already have the correct data and the issue resolves itself. That said, the negative impression the user has formed does not go away — and next time they will think twice before depositing funds into your project, or they will simply switch to a competitor. So there is a real risk of losing money. But worse scenarios are possible.
A Dangerous Replication Lag Example
- A withdrawal was created;
- The Replica does not yet see the new status;
- Another process reads the stale data;
- The system permits the action to be repeated;
- A duplication or inconsistent state arises.
In this case, the user either receives a doubled payout (from your funds) or receives nothing at all — just a baffling error and a dose of frustration. That is a direct loss of resources: both financial and user-trust.
Preventing Delays
Predicting Replication lag is extremely difficult, so the primary line of defense is monitoring (as covered in our earlier article “Data Monitoring in FinTech and Gambling: A Question of Survival, Not Comfort”) and a correctly designed project architecture from the outset. The necessary steps are:
- Monitor replication lag;
- Set up an alert on replication lag;
- Do not read critical financial data from a replica immediately after a write;
- Use read-after-write consistency;
- For balance / payment / withdrawal — read from primary;
- Use sticky reads after financial operations;
- Do not fail over to a replica that has fallen behind the primary;
- Check GTID / binlog position / WAL position before promoting a replica.
To wrap up the lag issue: using a Replica is genuinely useful for scaling, but dangerous for critical financial read-after-write scenarios. To get the most out of it, the setup must be handled by experienced engineers.
CQRS: Reducing Load on the Primary Database
We now turn to the topic of reads and writes — which in FinTech and Gambling carry critical importance. Write operations are computationally lighter for the system, while read queries generate significant load. Mixing both types in a single database causes chaos. Consider the following example: Backoffice opens a heavy report requiring substantial resources. At the same time, a SQL query reads millions of transactions at once, driving up DB CPU/I/O load. As a result, a payment callback is processed more slowly, causing a backlog of pending transactions. In practical terms — users do not see their deposits credited.
What we have here is an admin’s report query (a read) directly impacting client monetary operations. Clearly this should not happen, and some optimal solution needs to be applied to separate these concerns.
CQRS (Command Query Responsibility Segregation) is an architectural pattern that separates write/modify operations (Command) from read operations (Query) into two independent parts. This allows each to be optimized, scaled, and protected independently:
Command side — operations that change system state;
Query side — operations that only read data.
In FinTech and Gambling, write operations are most commonly money-related, while read operations matter for system functioning and management:
Write operations: | Read operations: |
deposit; | transaction history; |
withdrawal; | user account; |
bet placement; | backoffice; |
settlement; | CRM; |
bonus accrual; | analytics; |
refund; | reporting; |
balance update; | dashboards; |
KYC status update; | payment search; |
payment callback processing. | bet history; |
financial reconciliation reports. |
Here is an example of a correct CQRS architecture:
In this diagram:
- Command API — responsible for changing state;
- Primary DB — the source of truth for financial operations;
- Outbox Events — ensures events are not lost;
- Message Broker — delivers events to other services;
- Read Model — an optimized structure for fast reads.
As we can see, using CQRS helps reduce load on the primary DB, the number of heavy JOIN queries, the risk of lock contention, I/O pressure, CPU pressure, the risk of backoffice affecting payment flow, the risk of critical API degradation, and load on application instances. However, CQRS must be applied carefully — not all read queries can be transitioned to eventual consistency. In FinTech and Gambling, all monetary decisions must work against the current source of truth:
❌ Balance after a bet shows the wrong amount;
❌ Withdrawal decision is made on stale data;
❌ A repeat deposit is credited due to a stale read;
❌ A user can place a bet with funds that are no longer there;
❌ An AML/KYC decision is made on an outdated status.
✅ Transaction history will update in 1–3 seconds;
✅ CRM will see the new status a moment later;
✅ The analytics dashboard will update with a slight delay;
✅ A Backoffice report will not be real-time down to the millisecond.
Failover Without Data Loss
As a reminder from earlier materials: Failover is the process of automatically transferring workload to backup hardware or communication channels in the event of a primary component failure. This mechanism is a foundational element of High Availability system design. The main goal of failover is to minimize — or eliminate entirely — service downtime for users. In practice, this is not simply “switch slave to master.” In FinTech and Gambling, an incorrect failover can mean lost transactions or financial discrepancies. The process must not only be safe and fast, but also keep the project owner’s funds intact. Let’s walk through the problematic scenarios and outline the correct principles.
What Can Go Wrong?
- The primary goes down, but some transactions have not yet been replicated;
- The replica is lagging by a few seconds — or even minutes;
- The application continues writing to the old primary;
- A split-brain occurs — a state in High Availability clusters where, due to a network failure, a cluster splits into two parts that cannot see each other;
- Two nodes simultaneously consider themselves master;
- Some services write only to one DB, others to another;
- After the failover, payment callbacks or balance updates simply vanish.
There are plenty more such scenarios, but the essence is the same: prevent data loss caused by errors.
Useful Tools
For properly managing failover, the BeFund team has validated the following tools and approaches:
- ProxySQL (https://proxysql.com/);
- HAProxy (https://www.haproxy.org/);
- Orchestrator (https://github.com/openark/orchestrator) for MySQL;
- Patroni (https://github.com/patroni/patroni) for PostgreSQL;
- PgBouncer (https://www.pgbouncer.org/);
- Semi-sync replication;
- GTID;
- WAL;
- Binlog;
- Point-in-time recovery.
Idempotency: Protection Against Repeated Execution
Throughout this article series we keep coming back to data duplication as one of the fundamental problems for IT in general and FinTech and Gambling projects in particular. As you will recall, data = money. This is not some rare phenomenon — it is a standard situation with perfectly understandable causes. The logic is simple: every system in IT “plays it safe” and sends a retry request if the first one yields no result. So there is no problem of “tripling” or “halving.” A duplicate, however, is a problem that needs to be solved.
Idempotency is the property of an operation or method whereby executing it multiple times produces the same result as executing it once. In other words, repeating an action on an object does not alter its original state or create any additional side effects. In FinTech and Gambling, duplication is critical for deposits, withdrawals, payment callbacks, webhooks, bonus accruals, settlements, refunds, bets, balance updates, KYC events, affiliate events, and a range of lower-priority operations.
As an example, consider the case of a duplicated webhook from a payment provider. This is a common scenario if your project cannot accept a request in time: the provider will resend it, and the system will need to handle it somehow. If the system is not idempotent, the balance will be topped up twice. The client will be thrilled — you will not. If, however, the system is idempotent:
- The first webhook creates the transaction;
- The second webhook is identified as a duplicate;
- The balance is not updated a second time.
Practical mechanisms for achieving idempotency, from BeFund’s experience:
| idempotency_key | unique external_transaction_id | unique provider_payment_id | database unique constraints |
| transactional outbox | status machine | distributed lock | atomic balance update |
| processed_events table | deduplication layer | correlation_id | request_id |
Protection Against Transaction Duplication
Data duplication is a challenge, not a sentence — and experienced engineers know how to overcome it. Protection against repeated processing is critically important, as it simultaneously defends both your finances and your FinTech or Gambling project’s reputation. For example, a payments table in an online casino might contain the following fields:
In such a case, to guard against duplicates, you should at minimum have the following constraints:
- unique(provider, provider_transaction_id);
- unique(idempotency_key).
The golden rule to follow when fighting duplicates: it is better to return a successful response to the provider on a duplicate webhook than to credit the money a second time — and lose it.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
That wraps up Part 1 of the article series on behind-the-scenes insights into FinTech and Gambling development. In the continuation (comming soon) we will explore the question of identifying the true source of financial information, transaction processing challenges, queue management, recovering lost data, and much more. Stay with BeFund — and feel free to reach out to our specialists for advice and consultations whenever you are looking to develop your own FinTech or Gambling project!