MySQL Scaling Help Needed ASAP (Work Remotely)

I am looking for a highly experienced MySQL freelancer. I am having major load issues (especially during peak times) on my Web Application.

I am looking for someone in the United States who can help me with these problems remotely (ASAP, preferably today before 7:00 Central Time as that is a major peak in usage). I will provide the details for you to login via SSH to the servers.

I am willing to pay $100/hour for the right candidate and can pay a $500 bonus if you can help me get things running smoothly through the peak from 7:00 Central time to 10:00 Central time today.

Please email me at yeldarb [at] barbdwyer (dot) com detailing your experience in MySQL scaling.


Here is a brief description of the problem:

We are now serving about 12,000,000 pages per day to over 80,000 users. Our peak was earlier this year when we were did 22,000,000 pageviews over about 200,000 unique users. The major problems only started recently (when our load was less than it was during the peak earlier this year).

There are 4 of us in the company but I’m the only one with any programming knowledge (the other 3 handle user support, finances, etc).

Our setup is as follows:

1x MySQL Server
16 GiB RAM, 4 CPU cores guaranteed, Burstable to 8 in peak times

4x Application Tier Servers behind a load balancer
2 GiB RAM, 1/2 core guaranteed, Burstable to 8 in peak times

1x Cron Server
1 GiB RAM, 1/4 core guaranteed, Burstable to 8 in peak times

1x Memcached Server
1 GiB RAM, 1/4 core guaranteed, Burstable to 8 in peak times

Each of the other non-mysql servers also has an instance of Memcache running on it as well and a lot of the queries are cached to reduce load.

The problem we are having is very intermittent. “Basket” Pages (basically user profiles within the application) will usually load in between 0.01 and 0.02 seconds when things are going well but at seemingly arbitrary intervals they slow way down and end up taking longer than a second (in some cases longer than 2 seconds) to generate. Unfortunately these intermittent problems happen several times per hour.

During this time requests back up no the apache servers until the queue gets long enough that the pages stop loading in a reasonable manner and Facebook kills the traffic. Then about 2 minutes later Facebook lets traffic back through, the servers run fine for a bit and the cycle repeats.

You can see that MySQL is probably the culprit for this by doing a SHOW PROCESSLIST; seemingly simple queries (eg selecting a row from a table using the primary key for that table) take an obscenely long amount of time compared to when things are going well.

I’m not very good with the mysql-settings part of things but I have tweaked everything else I can think of: Indexes, Database Storage Engines, Queries, etc. The server settings were tweaked a few times by a guy at the hosting company but honestly I don’t know how much help that was.


Thanks for your time.