{"id":17,"date":"2012-02-15T01:20:00","date_gmt":"2012-02-15T01:20:00","guid":{"rendered":"http:\/\/killernat.com\/?p=17"},"modified":"2012-12-23T16:22:28","modified_gmt":"2012-12-23T21:22:28","slug":"problomes-on-beefgirs-server","status":"publish","type":"post","link":"http:\/\/killernat.com\/?p=17","title":{"rendered":"problomes on beefgirs server"},"content":{"rendered":"<p>its currently up but might go down many times in the next few hours\/days<\/p>\n<p>Emergency maintenance: Panelbox s101 \u2013 102<br \/>Published on February 13, 2012 at 5:03 pm by admin in: status<\/p>\n<p>Update Feb 14 7:30pm EST<\/p>\n<p>An urgent maintenance is required for the servers s101 and s102. This maintenance will require the interruption of the web service for up to 12 hours, starting tonight at 8pm (February 14th). We will attempt to keep the email services accessible during this period. The maintenance must be performed urgently in order to establish the server replications and ensure a proper service and server performance.<\/p>\n<p>Here are more details on the maintenance<\/p>\n<p>Context<br \/>We are not immune to equipment failure, even if our servers have 3 levels of replication\/backup:<\/p>\n<p>    Raid \u2013 Disks replicate between each other on the server<br \/>    DRBD \u2013 The content on the server is replicated towards another server, and vice-versa, which explains the link between s101 and s102.<br \/>    R1Soft \u2013 Backup system of the disks\u2019 contents<\/p>\n<p>History of the current problem and solutions<br \/>Yesterday, the s101 server met certain problems that lead us to diagnose a problem on the first level of replication, which was a defect RAID card. We then used our second level of replication by redirecting all sites from the s101 server its replica on the s102 server to avoid a long service interruption. This intervention partially reduces the server performances because it contains twice as much sites.<\/p>\n<p>We then started the RAID card replacement on the s101 server. This operation usually takes several hours and is transparent to our customers that are using, without knowing, the server replica (DRBD) during this period. In this case, a data corruption on one of the server\u2019s disks in turn corrupted the RAID reconstruction. Consequently, this corruption created a problem where instead of reconstructing the RAID in a few hours, we will need to resynchronize the DRBD replication back from s102 to s101. This means a lot of Gigabytes. This intervention can take up to 12 hours and uses almost all of the server\u2019s resources, which makes server usability almost null due to major slow downs, and the replication would take several days if we kept it alive.<\/p>\n<p>We will take this opportunity to perform a RAID card upgrade to a better one, and add additional memory to the server in order to improve server performances. That is not something we can normally do easily when the servers are up and running.<\/p>\n<p>The intervention will start at 8pm tonight the 14th of February. We are preparing the hardware for the upcoming manipulations. We expect this to be finished by 8am Wednesday morning.<\/p>\n<p>==========<\/p>\n<p>Update Feb 14 (14:50 EST)<\/p>\n<p>Servers are stable again, but are meeting certain latencies. s101 has continuous errors that might require an OS reinstall. Most of the ressources are being handled on s102 which creates the slow downs you might be experiencing at this time. We are looking into different solutions at this time.<\/p>\n<p>=====<\/p>\n<p>Update Feb 14 (13:40 EST)<\/p>\n<p>The s101 server will have to be rebooted after a synchronization problem with s102. This renders both servers with intermittent latency issues. We are working on resolving these issues as quickly as possible.<\/p>\n<p>=====<\/p>\n<p>Update Feb 13 (17:45 EST)<\/p>\n<p>The server is back online. It will remain under observation for a couple hours.<\/p>\n<p>=====<\/p>\n<p>Start time: immediately (Monday February 13th 2012 17:00 EST)<br \/>Resolution time: Estimated to 1 hour<br \/>Situation: Panelbox server s101 unreachable<br \/>Impact: The services provided by the Panelbox server s101 will be unavailable during the maintenance.<\/p>\n<p>We are currently proceeding to the  replacement of the RAID card in s101.<\/p>\n<p>Thank you for your comprehension.<\/p>\n<p>Should you have any inquiries, feel free to contact our support team: http:\/\/funio.com\/contacts.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>its currently up but might go down many times in the next few hours\/days Emergency maintenance: Panelbox s101 \u2013 102Published on February 13, 2012 at 5:03 pm by admin in: status Update Feb 14 7:30pm EST An urgent maintenance is required for the servers s101 and s102. This maintenance will require the interruption of the web service for up to 12 hours, starting tonight at 8pm (February 14th). We will attempt to keep the email services accessible during this period. The maintenance must be performed urgently in order to establish the server replications and ensure a proper service and server performance. Here are more details on the maintenance ContextWe are not immune to equipment failure, even if our servers have 3 levels of replication\/backup: Raid \u2013 Disks replicate between each other on the server DRBD \u2013 The content on the server is replicated towards another server, and vice-versa, which explains the link between s101 and s102. R1Soft \u2013 Backup system of the disks\u2019 contents History of the current problem and solutionsYesterday, the s101 server met certain problems that lead us to diagnose a problem on the first level of replication, which was a defect RAID card. We then used our second level of replication by redirecting all sites from the s101 server its replica on the s102 server to avoid a long service interruption. This intervention partially reduces the server performances because it contains twice as much sites. We then started the RAID card replacement on the s101 server. This operation usually takes several hours and is transparent to our customers that are using, without knowing, the server replica (DRBD) during this period. In this case, a data corruption on one of the server\u2019s disks in turn corrupted the RAID reconstruction. Consequently, this corruption created a problem where instead of reconstructing the RAID in a few hours, we will need to resynchronize the DRBD replication back from s102 to s101. This means a lot of Gigabytes. This intervention can take up to 12 hours and uses almost all of the server\u2019s resources, which makes server usability almost null due to major slow downs, and the replication would take several days if we kept it alive. We will take this opportunity to perform a RAID card upgrade to a better one, and add additional memory to the server in order to improve server performances. That is not something we can normally [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[14,13],"tags":[],"_links":{"self":[{"href":"http:\/\/killernat.com\/index.php?rest_route=\/wp\/v2\/posts\/17"}],"collection":[{"href":"http:\/\/killernat.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/killernat.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/killernat.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/killernat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=17"}],"version-history":[{"count":2,"href":"http:\/\/killernat.com\/index.php?rest_route=\/wp\/v2\/posts\/17\/revisions"}],"predecessor-version":[{"id":98,"href":"http:\/\/killernat.com\/index.php?rest_route=\/wp\/v2\/posts\/17\/revisions\/98"}],"wp:attachment":[{"href":"http:\/\/killernat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=17"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/killernat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=17"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/killernat.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=17"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}