• Category Archives News
  • every thing is 100% now

    every thing is 100% now from funio: Update Feb 16 06:30 EST Services have been back for about an hour and half now. s101 was completely replicated from the DRBD mirror on s102. This is the process that prompted us to bring both s101 and s102 offline, because it was crucial that this replication be done as quickly as possible. More detailed explanations for that were posted previously on the “Feb 14 19:30″ update. S101 is now replicating s102 as a low-priority background task, so some disk I/O will be taken by the system for maybe a couple days. Current estimates vary between 16 and 58 hours, depending on the load of either s101 or s102. Note that during this time everything should be working normally. The servers will gain some speed automatically once this process is completed. The new RAID card that was put in s101 should perform better than the previous one because it has roughly twice the processing power and 4 times the memory as the previous one. All of the hard drives were also replaced. While we were at it, we added 50% more RAM to s101. This extra memory will give us more room for performance enhancements in the future. We are truly sorry for the inconveniences all this might have caused. This is one of the reasons that prompted us to upgrade the hardware instead of just replacing the bad parts. The status has been put to yellow because we will actively monitor these two servers throughout the day to make sure everything works as expected. Thank you again for your patience and understanding. ==========


  • problomes on beefgirs server

    its currently up but might go down many times in the next few hours/days

    Emergency maintenance: Panelbox s101 – 102
    Published on February 13, 2012 at 5:03 pm by admin in: status

    Update Feb 14 7:30pm EST

    An urgent maintenance is required for the servers s101 and s102. This maintenance will require the interruption of the web service for up to 12 hours, starting tonight at 8pm (February 14th). We will attempt to keep the email services accessible during this period. The maintenance must be performed urgently in order to establish the server replications and ensure a proper service and server performance.

    Here are more details on the maintenance

    Context
    We are not immune to equipment failure, even if our servers have 3 levels of replication/backup:

    Raid – Disks replicate between each other on the server
    DRBD – The content on the server is replicated towards another server, and vice-versa, which explains the link between s101 and s102.
    R1Soft – Backup system of the disks’ contents

    History of the current problem and solutions
    Yesterday, the s101 server met certain problems that lead us to diagnose a problem on the first level of replication, which was a defect RAID card. We then used our second level of replication by redirecting all sites from the s101 server its replica on the s102 server to avoid a long service interruption. This intervention partially reduces the server performances because it contains twice as much sites.

    We then started the RAID card replacement on the s101 server. This operation usually takes several hours and is transparent to our customers that are using, without knowing, the server replica (DRBD) during this period. In this case, a data corruption on one of the server’s disks in turn corrupted the RAID reconstruction. Consequently, this corruption created a problem where instead of reconstructing the RAID in a few hours, we will need to resynchronize the DRBD replication back from s102 to s101. This means a lot of Gigabytes. This intervention can take up to 12 hours and uses almost all of the server’s resources, which makes server usability almost null due to major slow downs, and the replication would take several days if we kept it alive.

    We will take this opportunity to perform a RAID card upgrade to a better one, and add additional memory to the server in order to improve server performances. That is not something we can normally do easily when the servers are up and running.

    The intervention will start at 8pm tonight the 14th of February. We are preparing the hardware for the upcoming manipulations. We expect this to be finished by 8am Wednesday morning.

    ==========

    Update Feb 14 (14:50 EST)

    Servers are stable again, but are meeting certain latencies. s101 has continuous errors that might require an OS reinstall. Most of the ressources are being handled on s102 which creates the slow downs you might be experiencing at this time. We are looking into different solutions at this time.

    =====

    Update Feb 14 (13:40 EST)

    The s101 server will have to be rebooted after a synchronization problem with s102. This renders both servers with intermittent latency issues. We are working on resolving these issues as quickly as possible.

    =====

    Update Feb 13 (17:45 EST)

    The server is back online. It will remain under observation for a couple hours.

    =====

    Start time: immediately (Monday February 13th 2012 17:00 EST)
    Resolution time: Estimated to 1 hour
    Situation: Panelbox server s101 unreachable
    Impact: The services provided by the Panelbox server s101 will be unavailable during the maintenance.

    We are currently proceeding to the replacement of the RAID card in s101.

    Thank you for your comprehension.

    Should you have any inquiries, feel free to contact our support team: http://funio.com/contacts.