Нужен ли вашему бизнесу брокер данных: когда стоит внедрять

5 минут

Various sources claim that a message broker must be part of the circuit ESB→ (enterprise service tire). Some even believe that a broker is a major part of ESB or that these concepts coincide.

In this article, we'll figure out whether a bus is possible without a broker and when this approach is better for IT architecture.

I reread the logs, thought a lot

Let us turn (as has happened more than once) to a simple and understandable example with an online store.

Imagine a large company that sells automotive components in an online store. It is possible to roughly simulate which systems are included in its circuit:

PIM system that stores information about goods;
WMS system for warehouse management;
the online store itself;
CRM system that stores data on customers and discounts;
OMS order management system.

They exchange information with some frequency through the ESB layer. For example, an online store sends orders with statuses once per minute or once per second.

Do you need information on orders from an online store in the OMS system? Of course it is necessary. And the broker will send the entire message queue from the online store to OMS.

But our company is large and sells throughout Russia, not only at retail, but also in small wholesale. Therefore, it has several OMS in its circuit to transparently manage orders from different regions: the Far East and the Kaliningrad Region, for example, as well as wholesale and retail orders.

Has the scheme become more complicated? And with it, logic has become more complicated. One OMS needs only orders from Central Russia, another only needs small wholesale orders in Siberia, and the third only needs orders in a certain status. The fourth is only tire orders with the previous “paid” status and an additional flag that the ETL connector must put when processing...

And once a minute, the broker still sends the entire queue of new, canceled and unconfirmed orders from the online store to each OMS. And this can be either 3 records or 10,000.

What happens?

Each OMS system needs to receive all the messages on all orders, read them, and understand whether it needs this particular order or if it has the wrong status or the wrong region. This is additional logic that has to be written in code within the system, making the system heavier and slowing it down.

Как выглядит контур ESB с брокером сообщений Kafka | KT.Team

What happens if you use a database

In this case, one ETL connector takes all order statuses from the online store and adds them to base→. All the logic, regardless of the complexity of the conditions, is in the connector layer: they sort and transfer only those records that “their” OMS systems need. Excessive information does not overload the receiving systems — moreover, it does not even make it into the connector.

Got it the first time

other case — real, from the experience of one of our clients→.

The company sells household chemicals, food supplements, and health products online and offline. Some orders are placed through an online store, others through sellers or sales representatives in branches.

The bonus system for offline customers is tied to a combination of full name and contact details. But such situations have happened: when the “confirm” button was pressed, the customer could remember that their phone number had changed since the last visit. I had to re-order with the same product mix, the same buyer's name, but with a new phone. As a result, the broker sent both orders to the CRM system, and duplicates had to be cleaned out manually from each system.

Дубликаты заказов через брокер сообщений Kafka попадают во все системы| KT.Team

The case we've described is not the only case we know of. When using a broker, duplicate messages come to the end systems quite often. And in order to get rid of them, you either have to clean the duplicates manually or write additional logic into the recipient system. For example, the system should find duplicates by key parameters, check the time of sending, remove earlier records, and leave only newer messages.

What happens if you use a database

If you use a database+ETL layer in your ESB circuit, you can transfer message reconciliation logic to them. In this case, the recipient system will receive “clean” information, and you will not have to use additional system resources to process duplicates.

From scratch

Now let's imagine the situation: you have decided to connect a new system to the circuit. For example, BI to track metrics and analyze hypotheses.

What does it look like in outline with a message broker?

You have another system that starts “listening” to message queues by product cards, views, orders, etc. This is enough to get fresh data. But this is not enough for some conclusions: we also need historical data. This means that you will have to ask source systems to overload data.

It would seem that it is possible to get this upload through a broker as quickly as possible?

Yes, but no. First, if you have a lot of duplicate messages in the source system, this will definitely slow down uploading and processing. Secondly, consider also the asynchronous nature of the work: delays of 10-15 seconds are absolutely acceptable even at high loads and in very large circuits. It is impossible to predict how many records will not be uploaded in such situations.

We wrote more about this in one of the previous articles, in which compared integration approaches→.

What happens if you use a database

If your bus uses databases instead of a message broker, you already have a ready-made, de-duplicate-free data upload from all the systems you need. There are orders, product cards that can be correlated with orders, delivery statuses — everything you might need for business intelligence (as in our example) or any other new system or new process. You don't have to recompile anything, reload anything from source systems, remove duplicates, etc.

Where is the logic?

But sometimes you need to make a complex selection for external consumers, and here a broker is almost useless.

Let's present the case again. Your PIM system stores product cards: those that you have in stock now and those that are temporarily unsold. They are not sold because you are only buying them, or they are seasonal goods, or there are no deliveries yet — there are many options.

But the marketplace needs to receive from you the items that are in stock marked “in stock”. And separately, those that will arrive in the near future are also marked accordingly. Those that will run out soon and you don't plan to buy them should receive the “expire soon” status. To make such data streams, you need to read several sources, compare them, combine them and only then send them.

To implement this logic with a broker, you will have to code restrictions for uploading to the marketplace on the PIM system, the WMS system, the procurement management system (and, possibly, a couple more systems related to sales). With a large assortment and a complex catalog, you will either have to adjust the upload time to inactive system hours or put up with “brakes” that will capture key parts of your IT architecture.

When using a bundle from the +ETL database, all logic goes to the connector layer without slowing down any of the source systems.

It seems that a broker can be suitable for processes where there is no complicated logic and the maximum transfer speed is important. But even in these cases, the broker is not always the best solution. For example, on one of our projects, speed was a key metric: there weren't many records, but their status was constantly changing. We ensured that statuses were constantly updated with the help of a broker, but the receiving systems simply did not have time to process the flow of incoming messages with updates.

Paradoxically, it was the updates (the things that changed most often) that the consumer systems did not need. They were only interested in the notes themselves. As a result, batch processing from a relational database provided better speed: it did not overload the recipient systems and delivered all the necessary data to them.

KT.team's portfolio includes dozens of integration cases with different business logic. On our projects, we saw different logic, requests, and problems in data processing. And we know when it is better to offer a broker and in which cases it is better to offer a database. If you have any doubts, sign up for a consultation. Our experts will help you decide on a tool and build integrations correctly.

‍