Updating An Antiquated System - An Architectural Discussion

I recently saw this post on LinkedIn and wanted to reply.  I exceeded the maximum character limit in my comment, so I moved it to a  repost reply.  I exceeded the maximum character limit there, so here's my reply in a blog post.

This is a fun exercise.  Instead of listing all my assumptions up front, you'll see them mixed in among the steps below.

  1. This is why microservices were created. Ideally you'd want to smart on a small subset of the system and replace one feature at a time, ensuring parity with the existing system before flipping the switch.
  2. This is a great use-case for TDD. You can write tests that validate the existing system's functionality in your new code-base before you write the code. Over time, those tests turn green as you accurately implement the existing features.
  3. The next big caveat is good logs.  With something as mission critical as a mainframe, you cannot afford to lose anything to bad logging.  Moreover, it's SO easy to not log the important things when bad things happen.  I would ensure that logging was a core part of the system, that every step was logged, every failure, every success.  And these can't be simple logs like "Completed X".  The logs would have to contain enough information to resume the process they were running at the time in the event of failure.  We can afford to overload the logs these days with tools like Datadog and Kibana to filter through for the important stuff.  Storage is cheap.  Business losses are not.
  4. Presumably your existing team already is used to working on the existing system, but let's assume you somehow have an entirely fresh team, they'd still need to be comfortable searching and interpreting the old code-base, so they'd have to be quite capable in the language of the existing mainframe system. For this reason, you'll probably want to build the new system in the same language (which begs the question, why even replace it instead of refactoring?).
  5. So, to justify the scenario, there has to be an important reason to migrate away from that language (perhaps it's written in VB6 or something). In that case--and since it's a mainframe--I'd focus on close-to-the-metal languages for speed and accuracy (every layer is a point of potential failure and security risk). In that case, I'd probably use Rust. That will get you a good team of modern-oriented Devs, and you'd have decent Memory Protections built-in. You could always go with C or even C++ for this (and there are a lot of good Devs there btw), but I'd probably go with Rust.
  6. Now comes to the "tech stack" itself, which is the core question. 100 teams of 100 people collaborate using this mainframe application. That's 10,000 people--and I assume we're talking real-time in-the-office day-to-day. That's no twitter, but this thing would need to be fast and stable. We're probably talking about shared documents and inter-office messaging and the like.
    1. The mainframe app would have to convert into an API. No more direct access to green-screens.
    2. We could cache unchanging assets, like company logo stuff, etc.
    3. We probably have a lot of nightly jobs to migrate over to the new tech stack. I'd like to eliminate those if-possible, but we would probably start with them as our initial micro-services with the intent of eliminating them over time.
    4. We would need an internal interface. This would be the biggest hurdle. Users don't like change, and they're going to have to migrate to a new application. Our choices are: desktop or web. Here's a trick to that decision--using a desktop app just puts the deployment hassle onto our DevOps team. For that reason, I'd go web, but strictly internal. I would not expose this site externally. If users want to access it outside the office, that's what a VPN is for.
  7. Since we likely have internal infrastructure capable of all this, I doubt I would even bother with the cloud. This app seems small enough to be supported with one on-site server per site (and I assume 10,000 people is more than one site). You would need to ensure the site's are networked together, but if they were sharing one mainframe this whole time, they likely already were. This is one of those suggestions that is quite flexible, but I would probably avoid the cloud if possible. A 10,000 employee company doesn't become a 1,000,000 person company overnight, so I don't see the need for the instantaneous scaling that the cloud would support.
  8. The final bit here is what sort of hardware architecture. The industry standard these days is Kubernetes. If the client is using that already (even if it's cloud) we could just jump on that. But even if we're just running on some random VMs, I would support Containerization heavily here. It's a boon in a lot of ways, not the least of which is the ease of setting up Developer Environments. If you want to "Move Fast" in the way Meta/Facebook does, you need to make it as easy as possible to onboard new developers.
  9. I will note it, but I think it's a given at this point.  We'd be working in some sort of Agile-esque setup as a team, allowing for requirements to change (though it's my job to push back on wasted time).  We'd have a standardized tool for code reviews, likely built into our source control.  And we'd be using CI/CD deployments on whatever their existing tooling was.  Unless we're living in an SVN world, I don't think that there will be any pushback from Devs about upgrading to new infrastructure.
  10. Then we start looking down the road. In the future, the company will probably want to retire the mainframe server entirely. This is where all this planning pays off. The microservices we created will allow us to scale each individual feature of the original mainframe to office-appropriate levels. We see 1000 documents shared daily? That microservice can have lots of instances. We only see 2 messages sent back and forth each day because people prefer email? Awesome, no need to scale that feature up. The company would like to move to Kubernetes? Awesome, we're already in Containers. Just write some helm configs and we're good to go.  The funny part is, these would be the meetings I'd be in most of my day while I trust my Developers to write good code.

Comments

Popular Posts