On February 12th the key node of our servers architecture went suddenly down; the cause was an hardware failure that happened at the same moment in both disks in the RAID of the node. We communicated with our service providers and had the disks replaced in less than 1 hour.
Unfortunately, the procedure of RAID rebuilding forecast an ETA unexpectedly long.
The good news was that in the last month, because of the growing number of customers, we have been working on building and testing a completely new production environment that is safer and more scalable: we decided then to anticipate our schedule and move Timeneye in the new environment right away.
Everything went ok, and we were able to complete the transfer in 6 hours. The transfer implicated changing our DNS records, so it probably took a bit more for the change to propagate around the world.
We are very sorry for the inconvenience this outage caused to our customers, and we are granting a free week extension to everyone to compensate for this.
Outages happen to everyone from time to time, but we think that the most important thing is to learn from them and use them to build better systems.
This is what we did. In the next weeks we’ll keep working on our architecture to improve its performance.
Thanks for the patience and for the support we received from a lot of you.