Choosing the Right Data Storage Solution: A Critical Decision

Choosing the Right Data Storage Solution: A Critical Decision
Photo by benjamin lehman / Unsplash

Anytime you develop an application or software, you'll inevitably reach a point when you need to store data outside the application's space or context. This storage doesn't necessarily need to be a traditional database—it could be in-memory storage or simply files on a disk. The goal remains consistent: store data reliably so that it can be retrieved when needed.

This represents one of the most interesting challenges backend engineers face. Today's landscape offers numerous database technologies with varying characteristics:

  • Storage medium: memory, disk, or hybrid approaches
  • Data structure: relational schema or schemaless design
  • Architecture: primary-replica or consensus-based systems
  • Query interfaces: SQL, HTTP APIs, or proprietary query languages
  • Management model: fully managed services or self-hosted solutions

Key Questions to Consider

When selecting a data storage solution, you should ask yourself several critical questions:

  • What is the shape of your data? Schema-based or schemaless?
  • What are your data access patterns? Read-heavy or write-heavy?
  • What types of queries will you run? Transactional or analytical?
  • Where should the data reside? Disk, memory, or a combination?
  • Which protocols must it support? Compatibility with existing database protocols or new ones?
  • Who should manage it? Should a service provider handle operations, or should your team?

This list isn't exhaustive, but it highlights that choosing data storage is both essential and complex. Making the wrong choice for your application can lead to costly migrations later. You'll invest resources in the migration process and need to repeat the research exercise entirely.

My advice? If you lack sufficient data or facts to support a specific choice, opt for a boring technology. Yes, boring. Here's why:

Why Choose "Boring" Technology?

1. Compatibility

Mature, established technologies often receive support from newer solutions. For example, both Vitess and TiDB support MySQL's protocol. Vitess achieves this by utilizing MySQL underneath, while TiDB recognized the opportunity to support a widely used database technology, offering both protocol compatibility and straightforward data migration paths.

2. Strong Community Support

Mature technologies develop robust communities around them. Projects like PythonDjangoMySQL, and PostgreSQL benefit from:

  • Active local meetups and global conferences
  • Corporate sponsorship and backing
  • Comprehensive support structures
  • Extensive documentation and learning resources

3. Battle-Tested Reliability

"Boring" technologies have undergone numerous release cycles where features were added and bugs fixed. Their large user bases have uncovered edge cases, identified missing features, and reported bugs that wouldn't have been discovered during initial development. This maturity translates to reliability when it matters most.

Real-World Example: Miro's Migration Challenge

At WeAreDevelopers 2023, an engineer from Miro shared their experience migrating user data storage. Initially, they used Redis because it came with their gaming engine. However, as their platform grew, they discovered Redis couldn't efficiently accommodate their evolving data model. User data grew beyond what could reasonably be loaded into memory.

The engineer detailed the substantial costs they incurred performing online migrations—transferring data between storage systems without downtime. Their use case had:

Data Access Pattern

Their application exhibited a very high read pattern with relatively few writes—a pattern that wasn't optimal for their initial storage choice.

Consequences of Using Immature Technology

In each company, there is a different level of risk appetite, and the decision to use immature technology should be made with caution. The adoption of immature technology can be on an experimental basis, but it should be done with a clear understanding of the potential risks and benefits. What I mean is you can keep using the "boring" technology in production and critical systems, but you can also experiment with new technologies in a separate environment. This way, you can evaluate the technology's potential without risking your production systems.

There are pros and cons to using immature technology or being an early adopter.

  • Pros:
    • You can be the first to market with a new technology, potentially gaining a competitive edge.
    • You can influence the technology's development and direction.
    • You may have access to cutting-edge features and capabilities that are not yet available in more mature technologies.
    • You can build a strong relationship with the technology's creators and community, which can lead to better support and collaboration opportunities.
  • Cons:
    • Lack of community support
    • Limited documentation
    • Unstable APIs
    • Frequent breaking changes
    • Incomplete features

Conclusion

Choosing the right data storage solution is a critical decision that can significantly impact your application's performance, scalability, and reliability. By asking the appropriate questions and considering the characteristics of your data and access patterns, you can make an informed choice that aligns with your application's needs.

While "boring" technologies may not be the most exciting option, they often provide the stability, community support, and reliability necessary for successful data storage. By prioritizing these factors, you can ensure that your application remains robust and adaptable as it evolves.

Read more