The problem we faced
Like many fast-growing companies, we hit a point where our systems came under pressure. We saw this in part of our fraud prevention service, the engine behind real-time quote validation on our site and through aggregators.
Without revealing too much (fraud prevention likes its secrets), here's the short version: it was built on a six-year-old Neptune graph database. Traffic had grown a lot, and our latency was increasing rapidly.
To keep things ticking, we added a 2.5-second cap on queries. Not a full timeout, just a “return-what-you-can” approach. It worked, but meant we weren’t using all our data. It kept us running, but wasn’t up to Marshmallow standards. We like to move fast, but not at the cost of experience or reliability. So we rolled up our sleeves and got to work.
What we tried (That didn’t quite land)
As with many engineering challenges, our early attempts didn’t quite solve the problem. We tried scaling horizontally, which improved performance, but drove costs up to nearly $10K per month.
We monitored performance closely using p99 latency, which steadily approached the 2.5-second cap. Eventually, even p95 latency began to hit the limit. Our data team estimated we had about a year before accuracy would be significantly impacted. Further scaling wasn’t viable, it was simply too costly.
The breakthrough moment
Fortunately, our upcoming hackathon provided the perfect opportunity to explore alternatives. The problem drew enough interest that two teams independently built prototypes, all targeting a solution that was fast to query, cost-efficient, and tolerant of slower writes.
We evaluated two main options:
- Amazon OpenSearch service
- A familiar tool already in use elsewhere, but its cost was a concern.
- Pre-processed data with RDS
- Reproducing Neptune’s behaviour was relatively straightforward in our case, though this isn’t typical.
The result: RDS stood out across the board: lower latency, reduced costs, and greater internal familiarity. That said, since these were still early tests, we approached next steps carefully.
How we rolled It out
Minimising disruption was our top priority. The first step was to rebuild the service behind the same interface. While the architecture was straightforward (a message queue for processing and a REST endpoint for queries) the scale was significant, with close to a billion records.
We launched the new service within two weeks. The data migration, however, proved more complex. While extracting data from Neptune DB was challenging, our data analysts provided all necessary records in an S3 bucket. We then added a Lambda function to feed records into a message queue for bulk processing. Though direct migration to the RDS database was possible, we chose to rely on the service itself due to complex processing requirements. This proved to be the safer approach.Thanks to parallelisation and careful scaling, the full migration completed in around six hours. During this window, we upgraded the database instance and deployed roughly 20 replicas of the new service to maximise throughput.
Post-migration, we introduced the new service into the quoting flow in shadow mode, executing it in parallel but keeping the old system’s results live in production. This allowed us to compare both outputs in real time, with all data sent to Snowflake for analysis. After two months of monitoring and validation, we made the switch with confidence.
What changed
There was a clear and significant difference.

We went from multi-second queries to sub-100ms responses, with significantly better data coverage and a massive cost reduction.
What we learned
- Shadow mode is gold. It gave us confidence and flexibility to fine-tune before going live.
- Custom solutions can shine, but only when the problem is well-understood. We had six years of context to guide our choices.
- Small, focused changes can yield outsized results. Rethinking how we stored and queried data unlocked major performance wins.
Join us at Marshmallow
Looking for your next exciting challenge in software engineering? We’re hiring! Take a look at our open Software Engineering roles.