{"id":5363,"date":"2017-05-08T21:12:07","date_gmt":"2017-05-08T21:12:07","guid":{"rendered":"http:\/\/www.hemispheregames.com\/?p=5363"},"modified":"2017-05-09T22:10:36","modified_gmt":"2017-05-09T22:10:36","slug":"osmos-updates-and-floating-point-determinism","status":"publish","type":"post","link":"https:\/\/www.hemispheregames.com\/new_blog\/2017\/05\/08\/osmos-updates-and-floating-point-determinism\/","title":{"rendered":"Osmos, Updates, and Floating-Point Determinism"},"content":{"rendered":"<p>We just released Osmos 2.4.0 on iOS \u2013 our first release in almost 4 years. Why the long hiatus? Was it because 2.3.1, our previous release, was perfect? No, though it was solid and stable, and we were happy with it. Rather, it was due to our reliance on \u201cfloating-point determinism\u201d for multiplayer. And for years afterwards I thought it might be the last version we ever released on iOS.<\/p>\n<h2 style=\"margin-top: 30px;\">Desynchronization: A Little Osmos Multiplayer History<\/h2>\n<p>It\u2019s 2013. Miley Cyrus is wrecking stuff in her underwear. I was about to submit version 2.3.1 to Apple for approval. This was a minor update from 2.3.0 \u2013 just a few fixes. We did a little testing, and all seemed solid. Then, just before submitting, we got a \u201cblobiverse desync\u201d error during a multiplayer test.<\/p>\n<p><img decoding=\"async\" style=\"width: 100%;\" src=\"\/images\/blobiverse_desync.jpg\"\/><\/p>\n<p>What, you may ask, is a \u201cblobiverse desync\u201d? Well at the time it was something I never expected &#8211; nor wanted &#8211; to see again.<\/p>\n<p>It\u2019s 2011. Everyone is getting into Minecraft. Aaron and I were working hard on creating a multiplayer mode for Osmos \u2013 a surprisingly ambitious and tricky project. One technical question we faced was how to synchronize game state between devices over a network. There are a lot of moving blobs in Osmos: It takes roughly 7 kb to represent the game\u2019s state at any given moment. Multiply that by a decent framerate, and the network bandwidth required to stream Osmos state across a network gets into video streaming territory. We wanted to avoid putting that kind of load on people\u2019s networks and devices. Instead, if we could simulate the game locally and identically on each device, transmitting only player input (mass-firing, mainly) across the network, the bandwidth requirements would be minimal. We decided to go that route.<\/p>\n<p>We encountered and solved so many technical challenges in our development of Osmos multiplayer, many of which I found super interesting, but I want to focus on just one here: simulation determinism. It\u2019s actually a huge subject, so I\u2019ll just touch on a couple bits and zoom in on an even narrower subject: floating-point determinism.<\/p>\n<p>For starters, a titch about simulating physics on computer. If you\u2019re familiar with the term <a href=\"http:\/\/gafferongames.com\/networked-physics\/deterministic-lockstep\/\" target=\"_blank\">\u201clockstep simulation\u201d<\/a> &#8211; one of the things we implemented for Osmos multiplayer &#8211; feel free to skip the rest of this paragraph. Generally, physics is simulated a frame at a time: calculate, step; calculate, step; and so on. If you do it right, the smaller your time-step is (the time from frame to frame) the more precise your results will be. Precision aside, it\u2019s important here to note that a different time-step will give you different results. For example, simulating one second in 60 steps (1\/60th of a second per step) will give you a different result than simulating that same second in 30 steps (at 1\/30th of a second per step). But so long as the framerate is decent and things are stable, these differences don\u2019t get noticed by most players. In single-player Osmos we simply run as many frames as we can per second. If your device can handle it, this means the game will run at 60 fps, bound by the refresh rate of your display. But for various reasons some frames end up taking longer than others, sometimes due to the complexity of what\u2019s happening in game, and sometimes due to what else the device might be doing in the background. So one frame may take 1\/30th of a second, another 1\/47th, another 1\/60th, etc. And that\u2019s fine. The problem arises when you want two different simulations to give identical results. In that case, the time-step used for calculations must be identical for both simulations. Physics programers call this a \u201clockstep\u201d simulation, which we implemented for Osmos multiplayer. I won\u2019t dive any deeper into this subject here, but a great resource on all this, including code examples, can be found on <a href=\"http:\/\/gafferongames.com\/game-physics\/fix-your-timestep\/\" target=\"_blank\">Glen Fiedler\u2019s website<\/a>.<\/p>\n<p>You may ask: is this necessary? Won&#8217;t these calculation differences be minor? Who will notice? Well, Osmos falls into the category of sensitive systems that can produce drastically different results from slightly different initial conditions \u2013 aka the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Butterfly_effect\" target=\"_blank\">Butterfly Effect<\/a>. For example: A tiny difference in mass-transfer between motes on different devices will cause them to exert slightly more or less gravity from that moment onwards, causing trajectories to vary more and more over time, until a near-miss on one device is a collision on the other \u2013 a \u201ccatastrophic\u201d divergence. A player could die on one device but not the other. Not good.<\/p>\n<p>Moving on, once you&#8217;ve implemented a lockstep simulation over a network, how do you know if it&#8217;s working correctly? Well, you could run a simulation on two devices and watch\u2026 and watch\u2026 and watch\u2026 until you maybe notice something looks different on the two displays. We did this for a little while, saw differences accumulate over time, and decided to get more rigorous. We computed a checksum \/ hash of the summed masses, x &#038; y positions, and x &#038; y velocities of all motes on the level (5 floats total), and started to send that across the network for every time-step. The devices would compare their local hash with the remote device\u2019s hash, and if they differed we\u2019d throw an alert up on the screen &#8211; blobiverse desync! &#8211; and dump a bunch of relevant numbers to a log file to analyze. One by one we found the divergences, and resolved them in various ways. And once we were done we were kind of amazed: devices from different generations, each running different versions of iOS &#8211; even the Xcode simulator &#8211; they all gave the same results! We had achieved simulation determinism. Huzzah!<\/p>\n<p>I remember chatting with Jonathan Blow at GDC 2012 about what we were up to. We got into some technical details, and I mentioned we were using a lockstep simulation as our multiplayer solution. He made a face I can only describe as &#8220;Ugh&#8221; with a dash of &#8220;Really?&#8221; I remember saying &#8220;I know. I know. But it&#8217;s working well! We got this.&#8221;<\/p>\n<p>We happily released Osmos multiplayer and never saw that error message again&#8230;<\/p>\n<p>&#8230; until that final 2.3.1 build in 2013. What happened with that build? I had actually tested it a lot, and had never seen the desync. But one day right before release Dave and I were testing and &#8211; blammo &#8211; there it was. After some investigation we realized the desync only happened when playing 2.3.1 against the 2.3.0 store build. But what had changed? It turns out I had recently updated my version of Xcode. I tried reverting back to the previous version, and the problem went away. (I tested this to death.) Something in how the new Xcode was compiling our code gave different results from previous versions. Spooky. So, while Apple was still accepting builds from the older version of Xcode, we slipped it in. All went well with the release. But soon after, Apple stopped accepting builds from that version. We could no longer submit new versions of Osmos without breaking multiplayer. And for several years, we didn&#8217;t feel we needed to.<\/p>\n<h1 style=\"margin-top: 40px;\">The 32-Bit App-ocalypse<\/h1>\n<p>Over the past half-year, Apple has gotten more aggressive about culling \u201cunsupported\u201d apps from the App Store, with 32-bit-only apps slowly approaching the chopping block. It&#8217;s pretty clear at this point that Apple <a href=\"https:\/\/9to5mac.com\/2017\/04\/09\/32-bit-apps-ios\/\" target=\"_blank\">will cut support for these with iOS 11<\/a>. And so this January I rolled up my sleeves and started work on an update. I figured &#8211; worst case scenario &#8211; I might have to cut multiplayer entirely. I\u2019m not actually sure what percentage of players play multiplayer. In any case I figure single-player Osmos is better than no Osmos.<\/p>\n<p>Modernizing the Osmos project so it would build and run using the latest Xcode and frameworks took some work, but less than I expected. Ditto for 64-bit support. There were only a few glitches related to memory alignment in our rendering pipeline and network protocol. For example, here are a couple before-and-after videos I put together demonstrating the kind of alignment bugs that required squashing.<\/p>\n<p><video style=\"width: 100%;\" controls autoplay loop><source src=\"\/images\/64bit_orbit_glitch.mp4\" type=\"video\/mp4\">Your browser does not support the video tag.<\/video><\/p>\n<p><video style=\"width: 100%;\" controls autoplay loop><source src=\"\/images\/64bit_biophobe_glitch.mp4\" type=\"video\/mp4\">Your browser does not support the video tag.<\/video><\/p>\n<p>Pretty smooth work for the most part.<\/p>\n<p>Desync issues aside, bigger changes were required for multiplayer. Apple\u2019s networking and game frameworks have changed a lot over 4 years. And while I was at it I took a cue from Apple and removed all Game Center UI from the game, adding support for \u201cbackground matchmaking\u201d so players can practice\/play single-player levels while waiting for a match. (There aren&#8217;t as many people playing Osmos multiplayer these days, so it can be a while before someone else comes along looking for a match.)<\/p>\n<h1 style=\"margin-top: 40px;\">Hunting for Synchronicity<\/h1>\n<p>Of course, synchronization was the big question \/ risk in this update. I didn\u2019t expect the new version of Osmos to be compatible with the 2.3.1 store build, and sure enough it wasn\u2019t. So we\u2019d lose backwards compatibility at least. But I wondered if this new version would be compatible with itself across different devices and OS versions, like it was in \u201cthe good old days\u201d \u2013 specifically, 32 vs 64 bit devices. It wasn\u2019t. Ugh. And so began the deep dive into floating-point determinism.<\/p>\n<p>I tried many things. I spent a good chunk of time tinkering with compiler flags \/ settings. Turning off optimizations solved most of the desync issues, but I wanted to avoid that if possible. I tried <a href=\"https:\/\/gcc.gnu.org\/wiki\/FloatingPointMath\" target=\"_blank\">flags like mfpmath and sse2<\/a>, but they didn\u2019t seem to get me anywhere, and documentation on the web with respect to those and clang is pretty thin. I revisited my understanding of <a href=\"https:\/\/docs.oracle.com\/cd\/E19957-01\/806-3568\/ncg_goldberg.html\" target=\"_blank\">floating-point math<\/a>. I stared at waterfalls of numbers with many decimal places, trying to figure out where, why, and how things were diverging. I reduced the Osmos physics code to the point where nothing moved and no collisions occurred \u2013 at least that stayed in sync! I isolated the problem to the point where I had a single line of code that gave different results on 32 vs 64 bit devices for <em>some<\/em> (but not all!) input values. Simplified, it looked something like this:<\/p>\n<pre>mote.x += mote.vx * dt;<\/pre>\n<p>Simply update a mote&#8217;s x-position by its x-velocity times the time-step. For example, with<\/p>\n<pre>mote.x = 0.00668302644\r\nmote.vx = 2.32162547\r\ndt = 33\/1000.0   \/\/ time-step in milliseconds<\/pre>\n<p>The mote&#8217;s new x-position on the iPhone 6 would be 0.0832966641 whereas it would be 0.0832966715 on the iPod 5. (A small difference, but still important.)<\/p>\n<p>Were <a href=\"https:\/\/en.wikipedia.org\/wiki\/IEEE_floating_point\" target=\"_blank\">IEEE standards<\/a> being ignored? No. This difference only occurred in Final Release builds, with optimizations enabled. Eventually I convinced myself it was due to compiler optimizations causing some intermediate results to be temporarily stored in double \/ 64-bit registers on 64-bit devices, leading the final float \/ 32-bit result to be somewhat different. So I tried \u201cunrolling\u201d some simple calculations. For example, I expanded the single line above to<\/p>\n<pre>float dx = mote.vx * dt;\r\nmote.x += dx;<\/pre>\n<p>This kind of change helped in some sections of code, but not everywhere. In some places the compiler was still optimizing \/ merging instructions. So, how to tell the compiler not to daisy-chain floating-point calculations? Well, as someone who is absolutely not a compiler expert, I came across <a href=\"http:\/\/cottonvibes.blogspot.ca\/2010\/09\/using-volatile-keyword-to-prevent-float.html\" target=\"_blank\">a neat trick<\/a>: the somewhat esoteric <b>volatile<\/b> keyword. Rewriting the above code as<\/p>\n<pre>volatile float dx = mote.vx * dt;\r\nmote.x += dx;<\/pre>\n<p>tells the compiler to rewrite the result (as a float) to the dx variable as soon as it&#8217;s calculated, and not to use any intermediate \/ higher-level registers. It\u2019s a nice, code-local solution to the problem that can be applied in a very precise way where needed. I ended up having to do this to about 30 different blocks of code here and there in Osmos. It lengthens those sections of code (in some places from 10 lines to 40 lines of code), giving it more of an assembly-language style, but it works.<\/p>\n<p>Unfortunately that wasn\u2019t the one, magic bullet that solved everything. It took me a while to track down the last couple sources of divergence, and they turned out to be the sqrt() and some trignometry functions. (Osmos is all about circles after all.) When compiler optimizations are enabled, these both give slightly different results for some inputs. For example, acos(0.830012262) returns 0.591666639 on my iPhone 6 and 0.591666698 on my iPod 5. Volatile doesn&#8217;t help with this, so I tried rounding the results to the nearest degree, throwing away a bunch of precision, but giving indistinguishably-different results \u2013 totally fine so long as results match across devices. That worked. 99.999% of the time. Turns out every once in a while &#8211; hours of play on average &#8211; the results would end up on different integer boundaries after rounding. Ouch. Rounding can be a <a href=\"http:\/\/blog.frama-c.com\/index.php?post\/2013\/05\/02\/nearbyintf1\" target=\"_blank\">more complex operation than you might think<\/a>, but it\u2019s a solvable problem when there\u2019s a ground truth you\u2019re looking for, like the nearest integer to a given value. But when inputs are different, neither device in isolation has enough information to always come to the same result as the other. I lost days to that one, with much fun staring at streams and streams of tiny decimals. The solution? I went back to some basic circular geometry and came up with some new equations that would give a decent approximation of mote-mote area-overlap without the use of any trigonometry functions. The new approximation always underestimates the overlap, but that\u2019s ok since the mass transfer generally gets spread across multiple time-steps anyways. I didn\u2019t notice the difference, and I doubt anyone else would either.<\/p>\n<p>With that, and after tons of testing, I think Osmos 2.4.0 is solid on this front. All seems good after a few days in &#8220;the wild&#8221; as well. Can I guarantee there aren\u2019t any super-rare divergences remaining? Nope. Hopefully people will let me know if they ever see it occur.<\/p>\n<h1 style=\"margin-top: 40px;\">TL;DR<\/h1>\n<p>Overall I spent nearly 4 months working on this update. Most of that was on multiplayer, with one month of that spent in the rabbit hole of floating-point determinism. I hope this blog post helps others avoid some of that pain.<\/p>\n<p>To summarize: Lockstep synchronicity got you down?<\/p>\n<ul>\n<li>Try unrolling your calculations, assembly-style, and using the volatile keyword.<\/li>\n<li>Watch out for trig and other math functions. Sometimes they give the same results; sometimes they don\u2019t.<\/li>\n<li>Don\u2019t try rounding to solve your problem. It\u2019s futile and just makes the problem rarer and harder to track down.<\/li>\n<li>Make sure you test with optimizations on.<\/li>\n<\/ul>\n<p>Moving forward I\u2019m curious if a future version of Xcode will again break our synchronization, or if we\u2019re now more-or-less future proof. Time will tell.<\/p>\n<p>ps. I could go on a lot longer on this and many other subjects related to Osmos multiplayer. If you find this blog post useful and\/or interesting, please let me know. It\u2019ll motivate me to blog more than once per year! ;-)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We just released Osmos 2.4.0 on iOS \u2013 our first release in almost 4 years. Why the long hiatus? Was it because 2.3.1, our previous release, was perfect? No, though it was solid and stable, and we were happy with it. Rather, it was due to our reliance on \u201cfloating-point determinism\u201d for multiplayer. And for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5436,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3,6,9,11],"tags":[],"class_list":["post-5363","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dev","category-iphoneipad","category-multiplayer","category-osmos"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.hemispheregames.com\/new_blog\/wp-content\/uploads\/2017\/05\/blobiverse_desync.jpg?fit=568%2C320&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p5C9wi-1ov","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/posts\/5363","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/comments?post=5363"}],"version-history":[{"count":5,"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/posts\/5363\/revisions"}],"predecessor-version":[{"id":5442,"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/posts\/5363\/revisions\/5442"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/media\/5436"}],"wp:attachment":[{"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/media?parent=5363"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/categories?post=5363"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hemispheregames.com\/new_blog\/wp-json\/wp\/v2\/tags?post=5363"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}