What I Learned From a 7-Year Rewrite

Sept 12th, 2012, the new Simulink and Stateflow editors are available to the public: www.mathworks.com/downloads

Different people account the timeline differently, but to an approximation the rewrite took about 7 years and occupied a number of developers ranging from a low of 4 to about 18 at maximum intensity. The resulting software unifies and replaces the entire front-ends of two separate diagram editing platforms: Simulink (22 years old, millions of lines and testpoints, 100K+ customers), and Stateflow (16 years old, many hundreds of thousands of lines, 50K+ customers). So for almost one third of Simulink’s and half of Stateflow’s history, their front ends have been under rewrite. It was a massive project internally known as the “Unified Editors.”

We expect them to be a major success with users. They certainly represent a total overhaul both architecturally and interactively. They’re a nice piece of technology. And I don’t say that because I helped lead it—it’s rare for me to be able to see anything but flaws in projects I’m involved in. They have been in the hands of pre-release customers for some time, and are receiving very favorable feedback.

It is unusual for projects of this magnitude to succeed. Experienced project managers have told me they have never seen it happen. MathWorks management deserves a great deal of credit for allowing the project to converge instead of losing faith and cutting it off midway. That said, I will never in my career undertake another project in the same way I did this one. The Unified Editors have shaped me as much as I’ve shaped them.

Here is a laundry list of what a 7-year rewrite has taught me. I expect to write more on a bunch of these items in follow-up pieces. But for now, here is a pure dump of what I know now that I didn’t know when I started (none of which younger me would have taken on faith). And neither should you. It’s worth noting that other people involved in the project have different views and took different lessons. I was a chief initiator, the technical lead, an individual contributor, and one of three development managers for the project. And, of course, 7 years ago we promised the work in 2 years. So go ahead with your 2-year rewrite and we’ll compare notes in 2019.


  • You can’t estimate anything longer than 6 months
  • You can’t estimate anything that isn’t broken into 1 week tasks that are specific
  • Tasks that say “Implement X” are not understood
  • Putting large-scale items on a long timeline is fun but useless
  • Team members have a better assessment of readiness than direct management
  • Outside observers may have a better assessment of readiness than team members

Big bang vs. staged delivery

  • Big bang looks good because of early underestimation
  • Things will end up taking as long as incremental staged delivery anyway
  • Big bang happens because of doubts about sustained institutional investment in long programs
  • Staged delivery can be cut off at any point when more important pressures arise, leaving a program half-complete
  • It’s easy to know you’re converging when you are converging
  • It’s impossible to tell if you will converge if you are not yet converging

Rewrites & backwards compatibility

  • A large system has more behavior than anyone thinks it does (maybe 10 – 100 X)
  • Everything is the way it is for a reason
  • All the absurdities are that way because they needed to be that way at some point
  • People write what they can get away with
  • All quirks are baked in as assumptions to other existing systems
  • Avoid replacing successful legacy systems
  • Develop something else instead
  • Think creatively about how not to do a rewrite
  • A new product is 10-100X easier than a replacement to a large legacy system
  • Write something new that can gradually come to eclipse the feature set of the old
  • Backwards compatibility is a drag on developers, products, and quality (but may be necessary for customer/business reasons) (Look at what Apple gets away with. Nobody loves them for their lack of commitment to backwards compatibility. People love Apple for the products and technology that ditching old standards permits them to produce.)
  • In a system designed as a new framework and port of a legacy system to that framework, production of the framework is 10-100X easier than the port
  • It’s hard not to consider them 50/50 in planning, but they’re not


Sitting on top of legacy systems instead of cleaning them up has several characteristics


  • You don’t disturb anybody working in that codebase
  • You don’t regress existing functionality
  • You decouple shipping schedules
  • You rely on no other teams for deliverables and they don’t rely on you


  • You are subject to all the vagaries of the existing codebase
  • Rather than smooth out rough patches, you make new code rough to conform to them
  • At the end, you still have all the cleanup to do
  • You do not have to communicate with other teams, so you have to force yourself to (we didn’t)


  • The team has a more realistic assessment of readiness than management
  • Unrealistic targets are really demoralizing and demotivating
  • Missing targets, realistic or not, is demoralizing
  • Protracted stabilization is soul-crushing
  • Customer exposure is a big morale boost


  • Scope should be aggressively minimized
  • Features that management believes in more than developers do are demoralizing
  • Cutting features is great, the more the better
  • Minimum viable product considerations are very hard to evaluate when replacing an existing system


  • Modernizing an old codebase will require more memory
  • Dedicated performance engineers really help
  • Performance, especially of interactions, is very hard to lock down


  • Test coverage of the existing system will not be good enough
  • Passing the old tests is essential, but doesn’t indicate anything about the quality of the new work
  • The failures of the new system will be very different and the existing tests cover mostly the old failures


  • It really makes a difference
  • People don’t want to work in a dirty environment
  • The team knows what isn’t working and needs support to be allowed to fix it properly
  • Done properly it does make remaining work go faster (can make the difference between converging and diverging)

Full-stack Iterations

  • Must eventually stop rejecting and throwing out iterations and settle on one to prepare for shipment
  • Key to utility of iterations is quantity and speed
  • Anything that impedes speed or increases cost of production or throwing away is getting in the way
  • Never ship features based on an iteration that may not be the final one

Semi-related features delivered on the way

  • They are a distraction
  • They are never excellent features because they aren’t what the team really means to do
  • The work required to bring them to shipping state and maintaining them during the main effort is a huge distraction
  • There is a little bit learned about existing systems and what bringing them to production quality entails
  • They remain a huge drag on attention and resources even after the primary shipment because they need to be ported
  • Requires organizational support not to demand them from a long program


  • Having clients too early is deadly
  • Mismatched schedules and requirements will warp growing systems
  • The integrity of the framework is compromised as shortcuts are taken to satisfy immediate needs of clients out of the appropriate construction sequence
  • You will never feel ready for clients even when you are
  • At the point the work is ready, turning away clients is destructive
  • It takes three clients to sufficiently drive generalization of a framework


  • You have to turn it on before it’s ready in order to get it ready
  • The issues involved in really running a new system in production cannot be simulated
  • You should not plan to turn on for the first time and ship in the same release


  • Shifting requirements are a reality
  • But in-flight design changes must be minimized
  • Choices made off-the-cuff need to be considered for their expense over leaving things the way they are
  • Complex systems need a design document for developers, testers, doc, and usability to work off jointly
  • These must be done at a fairly low level, a high-level one doesn’t specify anything sufficiently

Prototypes, walkabouts, demo nights

  • Never ship features based on early iterations (did I already say that?)
  • Prototypes always appear closer to ship-readiness than they are in reality
  • Prototype code must be kept out of the production stream
  • That requires development procedures that enable it
  • Prototype code that leaks into production will cause problems for a very long time
  • Customer exposure is a big morale boost and mitigates risk
  • Walkabouts have to be well-managed and infrequent so as not to leave people waiting and uncertain
  • Some people’s work shows more easily than others
  • Some people like this kind of exposure more than others
  • It’s good for upper management to meet the team and talk to them
  • There is a danger of pressure to change design on-the-fly at these events
  • Pressures to show work for a deadline leads to shortcuts
  • That’s OK in prototype code, not in production
  • The deadline for something to be shown in demo/walkabout must be a decoupled from the deadline for submission to a production stream (they may have very little to do with one another)
  • But it’s nearly impossible for a viewer of the work to understand that it is nowhere near complete

That’s all I can think of at the moment. There’s a lot I want to write more about. I can’t wait to apply what I’ve learned to making more software, better, faster.

4 thoughts on “What I Learned From a 7-Year Rewrite

  1. Hi Simon! Congratulations! You’ve got some great nuggets above. I look forward to reading more as you elaborate on your favorites.

    A couple personal favorites are:

    “It takes three clients to sufficiently drive generalization of a framework”
    I suppose it needs to be the right three, but this seems like a great way to avoid analysis paralysis– focus on the exact needs of a small set, and run with it.

    “The issues involved in really running a new system in production cannot be simulated”
    I totally feel this. Do you anticipate this with tools for handling the unexpected in production environments– support tools, copious logging, beer?

    Lastly, I love all of the first three bullets in the estimation section. True, true, true. What I want to know is how to deal with this. Do you have advise on arriving at a set of specific meaningful tasks? Does the whole team get involved? Is the task design itself a task to be completed first?

    • Hey, hi Tim. Thanks. So the way we dealt with the impossibility of anticipating all the production issues was to turn it on in-house about a year before it was ready, wait three painful months, and turn it back off. I’d be lying if I told you we planned it that way. We didn’t have a really good sense at that point of how fast we were going to be able to bring things to production quality, and we thought we might be able to turn it on, and push it hard up over the hump and coast into release. That’s not what happened. As soon as we turned it on, we found way more second-floor doors that opened onto empty space than we ever knew we had. But we had to turn it on to find them. There were hundreds of other developers affected, and we didn’t make a lot of friends that way. And we turned it back off in time to qualify the old stuff for the regular release cycle. And then we didn’t ship for two more releases after that. But that experience of turning it on before it was ready was invaluable. No matter how much more time we took with it we wouldn’t have found out what we needed to. Now, we have a six-month release cycle, and how this fits into a shorter web-like release cycle, I’m not sure. There’s probably a lot more flexibility the shorter the release cycle, and the quicker you can revert. We’re strictly versioned, shipped six-month iterations.

  2. Hey Simon! Congratulations on launching such an extensive project. This is an amazingly thorough and informative list. And seven years weathering changing markets, technologies, and personnel no doubt adds another layer of complication to everything here. I’m also very much looking forward to further elaborations on what you’ve outlined.

    I really like the outline of code evolution you relate under various sublists: legacy, prototype, demo, and production, and how they are produced, perceived, and valued differently. I’ve often prototyped features for demos in a couple hours that would have taken weeks to finish. It can indeed give a false sense of potential development speed to outside observers, but it sure is fun.

    I’m curious about the strategies you’ve employed for managing expectations, particularly when you’re about to “turn it on before it’s ready in order to get it ready?” How do you adjust for the need for clients at particular times, management’s urgent requests for demos of new features, and the fact that viewers can’t tell “that it is nowhere near complete?”

    And I’d still love to have a chat about big data sometime!

    • Thanks William! You’re right, managing expectations is really hard. We did it pretty wrong a lot of the time. I wonder if there are cultural norms of estimation that differ in across organizations. I think we were fooling ourselves early on in the project, and I have no idea why others with more experience couldn’t tell. Maybe they could and didn’t say anything. I think it’s hard to argue with an unrealistically good projection. Who wants to shoot it down? We heard a lot about “stretch” goals, but the difference between a stretch goal and a tear something goal is hard to discern. I’ve become a strong believer in scoping things down until they fit instead of trying to cram things in under the wire.

      And yes. Let’s talk. I’ll be in touch.

Leave a Reply

Your email address will not be published. Required fields are marked *