Software Information |
|
40/sec to 500/sec
Introduction Surprised, by the title? well, this is a tour of how we cracked the scalability jinx from handling a meagre 40 records per second to 500 records per second. Beware, most of the problems we faced were straight forward, so experienced people might find this superfluous. * 1.0 Where were we? 1.1 Memory hits the sky * 2.0 Road to Nirvana 2.1 Controlling memory! * 3.0 Bottom line Where were we? Initially we had a system which could scale only upto 40 records /sec. I could even recollect the discussion, about "what should be the ideal rate of records? ". Finally we decided that 40/sec was the ideal rate for a single firewall. So when we have to go out, we atleast needed to support 3 firewalls. Hence we decided that 120/sec would be the ideal rate. Based on the data from our competitor(s) we came to the conclusion that, they could support around 240/sec. We thought it was ok! as it was our first release. Because all the competitors talked about the number of firewalls he supported but not on the rate. Memory hits the sky Our memory was always hitting the sky even at 512MB! (OutOfMemory exception) We blamed cewolf(s) inmemory caching of the generated images.But we could not escape for long! No matter whether we connected the client or not we used to hit the sky in a couple of days max 3-4 days flat! Interestingly,this was reproducible when we sent data at very high rates(then), of around 50/sec. You guessed it right, an unlimited buffer which grows until it hits the roof. Low processing rate We were processing records at the rate of 40/sec. We were using bulk update of dataobject(s). But it did not give the expected speed! Because of this we started to hoard data in memory resulting in hoarding memory! Data Loss :-( At very high speeds we used to miss many a packet(s). We seemed to have little data loss, but that resulted in a memory hog. On some tweaking to limit the buffer size we started having a steady data loss of about 20% at very high rates. Mysql pulls us down We were facing a tough time when we imported a log file of about 140MB. Mysql started to hog,the machine started crawling and sometimes it even stopped responding.Above all, we started getting deadlock(s) and transaction timeout(s). Which eventually reduced the responsiveness of the system. Slow Web Client Here again we blamed the number of graphs we showed in a page as the bottleneck, ignoring the fact that there were many other factors that were pulling the system down. The pages used to take 30 seconds to load for a page with 6-8 graphs and tables after 4 days at Internet Data Center. Road To Nirvana Controlling Memory! We tried to put a limit on the buffer size of 10,000, but it did not last for long. The major flaw in the design was that we assumed that the buffer of around 10000 would suffice, i.e we would be process records before the buffer of 10,1000 reaches. Inline with the principle "Something can go wrong it will go wrong!" it went wrong. We started loosing data. Subsesquently we decided to go with a flat file based caching, wherein the data was dumped into the flat file and would be loaded into the database using "load data infile". This was many times faster than an bulk insert via database driver. you might also want to checkout some possible optimizations with load data infile. This fixed our problem of increasing buffer size of the raw records. The second problem we faced was the increase of cewolf(s) in memory caching mechanism. By default it used "TransientSessionStorage" which caches the image objects in memory, there seemed to be some problem in cleaning up the objects, even after the rerferences were lost! So we wrote a small "FileStorage" implementation which store the image objects in the local file. And would be served as and when the request comes in. Moreover, we also implmentated a cleanup mechanism to cleanup stale images( images older than 10mins). Another interesting aspect we found here was that the Garbage collector had lowest priority so the objects created for each records , were hardly cleaned up. Here is a little math to explain the magnitude of the problem. Whenever we receive a log record we created ~20 objects(hashmap,tokenized strings etc) so at the rate of 500/sec for 1 second, the number of objects was 10,000(20*500*1). Due to the heavy processing Garbage collector never had a chance to cleanup the objects. So all we had to do was a minor tweak, we just assigned "null" to the object references. Voila! the garbage collector was never tortured I guess ;-) Streamlining processing rate The processing rate was at a meagre 40/sec that means that we could hardly withstand even a small outburst of log records! The memory control gave us some solace,but the actual problem was with the application of the alert filters over the records. We had around 20 properties for each record, we used to search for all the properties. We changed the implementation to match for those properties we had criteria for! Moreover, we also had a memory leak in the alert filter processing. We maintained a queue which grew forever. So we had to maintain a flat file object dumping to avoid re-parsing of records to form objects! Moreover, we used to do the act of searching for a match for each of the property even when we had no alert criteria configured. What data loss uh-uh? Once we fixed the memory issues in receiving data i.e dumping into flat file, we never lost data! In addition to that we had to remove a couple of unwanted indexes in the raw table to avoid the overhead while dumping data. We hadd indexes for columns which could have a maximum of 3 possible values. Which actually made the insert slower and was not useful. Tuning SQL Queries Your queries are your keys to performance. Once you start nailing the issues, you will see that you might even have to de-normalize the tables. We did it! Here is some of the key learnings: * Use "Analyze table" to identify how the mysql query works. This will give you insight about why the query is slow, i.e whether it is using the correct indexes, whether it is using a table level scan etc. * Never delete rows when you deal with huge data in the order of 50,000 records in a single table. Always try to do a "drop table" as much as possible. If it is not possible, redesign your schema, that is your only way out! * Avoid unwanted join(s), don't be afraid to de-normalize (i.e duplicate the column values) Avoid join(s) as much as possible, they tend to pull your query down. One hidden advantage is the fact that they impose simplicity in your queries. * If you are dealing with bulk data, always use "load data infile" there are two options here, local and remote. Use local if the mysql and the application are in the same machine otherwise use remote. * Try to split your complex queries into two or three simpler queries. The advantages in this approach are that the mysql resource is not hogged up for the entire process. Tend to use temporary tables. Instead of using a single query which spans across 5-6 tables. * When you deal with huge amount of data, i.e you want to proces say 50,000 records or more in a single query try using limit to batch process the records. This will help you scale the system to new heights * Always use smaller transaction(s) instead of large ones i.e spanning across "n" tables. This locks up the mysql resources, which might cause slowness of the system even for simple queries * Use join(s) on columns with indexes or foreign keys * Ensure that the the queries from the user interface have criteria or limit. * Also ensure that the criteria column is indexed * Do not have the numeric value in sql criteria within quotes, because mysql does a type cast * use temporary tables as much as possible, and drop it... * Insert of select/delete is a double table lock... be aware... * Take care that you do not pain the mysql database with the frequency of your updates to the database. We had a typical case we used to dump to the database after every 300 records. So when we started testing for 500/sec we started seeing that the mysql was literally dragging us down. That is when we realized that the typicall at the rate of 500/sec there is an "load data infile" request every second to the mysql database. So we had to change to dump the records after 3 minutes rather than 300 records. Tuning database schema When you deal with huge amount of data, always ensure that you partition your data. That is your road to scalability. A single table with say 10 lakhs can never scale. When you intend to execute queries for reports. Always have two levels of tables, raw tables one for the actual data and another set for the report tables( the tables which the user interfaces query on!) Always ensure that the data on your report tables never grows beyond a limit. Incase you are planning to use Oracle, you can try out the partitioning based on criteria. But unfortunately mysql does not support that. So we will have to do that. Maintain a meta table in which you have the header information i.e which table to look for, for a set of given criteria normally time. * We had to walk through our database schema and we added to add some indexes, delete some and even duplicated column(s) to remove costly join(s). * Going forward we realized that having the raw tables as InnoDB was actually a overhead to the system, so we changed it to MyISAM * We also went to the extent of reducing the number of rows in static tables involved in joins * NULL in database tables seems to cause some performance hit, so avoid them * Don't have indexes for columns which has allowed values of 2-3 * Cross check the need for each index in your table, they are costly. If the tables are of InnoDB then double check their need. Because InnoDB tables seem to take around 10-15 times the size of the MyISAM tables. * Use MyISAM whenever there is a majority of , either one of (select or insert) queries. If the insert and select are going to be more then it is better to have it as an InnoDB Mysql helps us forge ahead! Tune your mysql server ONLY after you fine tune your queries/schemas and your code. Only then you can see a perceivable improvement in performance. Here are some of the parameters that comes in handy: * Use the buffer pool size which will enable your queries to execute faster --innodb_buffer_pool_size=64M for InnoDB and use --key-bufer-size=32M for MyISAM * Even simple queries started taking more time than expected. We were actually puzzled! We realized that mysql seems to load the index of any table it starts inserting on. So what typically happened was, any simple query to a table with 5-10 rows took around 1-2 secs. On further analysis we found that just before the simple query , "load data infile" happened. This disappeared when we changed the raw tables to MyISAM type, because the buffer size for innodb and MyISAM are two different configurations. for more configurable parameters see here. Tip: start your mysql to start with the following option --log-error this will enable error logging Faster...faster Web Client The user interface is the key to any product, especially the perceived speed of the page is more important! Here is a list of solutions and learnings that might come in handy: * If your data is not going to change for say 3-5 minutes, it is better to cache your client side pages * Tend to use Iframe(s)for inner graphs etc. they give a perceived fastness to your pages. Better still use the javascript based content loading mechanism. This is something you might want to do when you have say 3+ graphs in the same page. * Internet explorer displays the whole page only when all the contents are received from the server. So it is advisable to use iframes or javascript for content loading. * Never use multiple/duplicate entries of the CSS file in the html page. Internet explorer tends to load each CSS file as a separate entry and applies on the complete page! BottomlineYour queries and schema make the system slower! Fix them first and then blame the database! See Also * High Performance Mysql * Query Performance * Explain Query * Optimizing Queries * InnoDB Tuning * Tuning Mysql Categories: Firewall Analyzer | Performance TipsThis page was last modified 18:00, 31 August 2005. -Ramesh-
MORE RESOURCES: Check Point Software Reports Fourth Quarter and 2024 Full Year Results Check Point Software DeepSeek AI Is 'Good News' For Enterprise Software, Says SAP CEO Investor's Business Daily Advisory details ransomware attacks on SimpleHelp remote access software American Hospital Association Honda issues recall over software glitch. Which vehicles are affected Lansing State Journal Guide to Legal Technology Software Bloomberg Law Exclusive: Apex Custom Software hacked, threat actors threaten to leak the software DataBreaches.net Honda recalls 295K vehicles for software issue KOBI-TV NBC5 / KOTI-TV NBC2 Clear-Com Unveils EHX v14 Software Update Sports Video Group Walmart has H&R Block tax software on sale for up to $16 off to save on filing your 2024 taxes NJ.com JONAS CLUB SOFTWARE UNVEILS INNOVATIONS & THE JONAS OPEN VIRTUAL GOLF TOURNAMENT AT THE 2025 CMAA CONFERENCE The Golf Wire Nearly 300,000 Honda and Acura Vehicles Recalled Over Faulty Software, Engine Stall Risks AboutLawsuits.com Managing the Risks of China’s Access to U.S. Data and Control of Software and Connected Technology Carnegie Endowment for International Peace Serco Expanding U.S. Business With Acquisition Of Northrop Grumman’s Training And Software Unit Defense Daily Network Honda recalls 295,000 vehicles due to software error that could cause engine to lose power USA TODAY Atlassian Earnings Beat. Software Maker's Revenue Guidance Above Views. Investor's Business Daily Checkpoint Software (CHKP) PT Raised to $220 at Stifel StreetInsider.com North Korean Lazarus hackers launch large-scale cyberattack by cloning open source software TechRadar QBS Software picks up Prianto ComputerWeekly.com SLK Software's promoters look to sell majority stake The Economic Times PE Weekly: Deloitte Acquires ERP Software; Food and Beverage Deals Return Middle Market Growth Checkpoint Software (CHKP) PT Raised to $220 at Mizuho StreetInsider.com Website Builder Software Market is projected to grow at USD 3.9 billion by 2032, CAGR with 7.9% EIN News KCS showcases its latest software at ARA Show International Rental News IBM Stock Pops On Earnings Beat, Software Growth, Free Cash Flow Outlook - Investor's Business Daily IBM Stock Pops On Earnings Beat, Software Growth, Free Cash Flow Outlook Investor's Business Daily Cathie Wood Says Software Is the Next Big AI Opportunity -- 2 Ark ETFs You'll Want to Buy if She's Right The Motley Fool SAP extends support deadline for getting off legacy software – in very special circumstances The Register Checkpoint Software (CHKP) PT Raised to $240 at Raymond James StreetInsider.com Checkpoint Software (CHKP) PT Raised to $220 at Cantor Fitzgerald StreetInsider.com IBM Is Seeing Growth in Software and AI Morningstar Appraisals for software engineers: Microsoft and Amazon are using performance reviews to decide who gets s The Economic Times Orchard Software Named Top LIS Vendor by 2025 Black Book Market Research for Seventh Consecutive Year PR Newswire Securing the Software Supply Chain: A 2035 Blueprint The New Stack American Honda Recalls 295,000 Vehicles in the U.S. to Update Fuel Injection Software Honda Newsroom Hg looks to raise $12bn for large-cap software bets Private Equity International The toll Trump 2.0 could take on LatAm’s software, IT services exports BNamericas English Check Point Software Technology (CHKP) Tops Q4 EPS by 5c StreetInsider.com Check Point Software shares edge lower after Q4 results Investing.com Drone company's software will no longer stop flights over wildfires, other no-fly zones NBC San Diego Startups to Watch 2025: VedaPointe's software automates workflow to improve health care The Business Journals Google open-sources the Pebble smartwatch’s software, and its creator is making a new model Engadget HeartBeam submits 510(k) application to FDA for ECG software Medical Device Network Former Cruise engineers launch AI-powered design software startup Hestus The Business Journals Plus expands from self-driving to software-defined ADAS Automotive World Accelerating software that helps the helpers BetaKit - Canadian Startup News Weibel chooses radar control and display software from Cambridge Pixel for XENTA surveillance radar Military & Aerospace Electronics How a Free Software Strategy Catapulted DeepSeek to AI Stardom The Wall Street Journal |
RELATED ARTICLES
Beware of Spyware One day, you suddenly realize that your computer started to worknoticeably slower than it used to. You decide to run de-fragmentation of your hard drive and add more virtual memory to the system. RFID: Strengthen the Position for SAP; United States SAP Inc., a global leader in client/server enterprise application software that for years been the backbone to RFID application has seen a 1,000% spike year-over-year in customers, interested in RFID deployment. These Items Are A Must Before Making The Decision To Purchase Any Off-The-Shelf Software 1. What determines the software price? Is it Per Seat or Per User or Per Processor?The cost of software is determined in many ways. Microsoft Great Plains Chemicals & Paint Industry Implementation & Customization Notes Microsoft Great Plains fits to majority of industries, in the case of Chemicals & Paint you should consider implementation with balanced approach of utilizing existing Great Plains standard module and light customization and reporting with Great Plains Dexterity, MS SQL Server stored procedures, Modifier/VBA and direct .Net publishing from Great Plains Company database. Razzle Dazzle Them Once upon a time not so long ago, there was a little boy who went to the market to try to sell his wares. He wasn't having any luck. Great Plains Customization Upgrade- Overview For CIO/IT Director Around the same time Microsoft made its move with .Net introduction and tried to gain portion of cheaper databases market: Pervasive SQL/Btrieve, Ctree/Faircom - and launched free MSDE database program. Easy Guide to RAID Recovery What is RAID RECOVERY?RAID stands for Redundant Array of Inexpensive Disks. It is a method of combining several hard drives into one unit. How to Upgrade Dexterity Customization - Tips for IT Manager If you have Microsoft Great Plains and support it for your company and have light or heavy Great Plains customization, written in Dexterity - you need to know your options in upgrading Great Plains or migrating it from ctree/Pervasive to MS SQL/MSDE.Great Plains Dexterity is proprietary programming language/environment, which was created in early 1990-th to provide platform / database / graphical interface independence for Mac and Windows based Great Plains Dynamics. Basic Tips and tricks for Windows XP Running Applications in Compatibility ModeWith Windows XP, you can run programs as if though they were being run under a different operating system. (This is known as "emulation". Microsoft C# vs. VB.Net Hi, Guys,I believe a lot of programmers are trying to speculate which Microsoft language is the language of the future .Net applications. The True Meaning of Freeware The vast majority of us will have, at some point, had freeware games or applications installed on our systems. If you've played an online Java or Flash based game, you've used freeware. Linux - Keyboard Or Mouse Just stress testing one of the latest Linux distributions. Been testing it for about two months. eCommerce development for Microsoft Great Plains: tools and highlights for programmer Microsoft Business Solutions Great Plains, former Great Plains Software Dynamics and eEnterprise was designed in earlier 1990th as ERP, which can be easily transferable to the winning Database and OS platform and it was originally available on Mac and PC - Mac OS and Microsoft Windows respectively. Graphical platforms battle is pretty much over and now with eCommerce demands, we should look at Great Plains Dynamics tables structure:? Naming Convention. History of Java The java programming language is becoming more and more popular each day. It is the language without which one cannot even hope to a land a job thesedays. Hubris - Definition: Microsofts Passport Before September of 1995, Microsoft ignored the Internet because their 16-bit Windows 3.1 operating system couldn't handle the 32-bit Internet world. Spyware: What It Is and How to Combat It Spyware is software or hardware installed on a computer without the user's knowledge which gathers information about that user for later retrieval by whomever controls the spyware.Spyware can be broken down into two different categories, surveillance spyware and advertising spyware. COSMIC: A Small Improvement on the Symons Method The COSMIC FP (function point) software quality metric, is no longer 'proposed' but an actual system in use and internationally recognised, whereas MarkII, like other older systems, is not recognised anywhere, and, even in the UK is in decline if not actually dormant, so this debate is already over.Historically, from my limited understanding of the situation, it seems that originally there were upwards of 35 variants of function point style metrics until the ISO developed criteria for a acceptable solution, ISO 14143: Parts 1 to 5 (1995-2002):The COSMIC group reviewed existing functional size measurement methods, namely the work done in the late 80's by Charles Symons in the UK. Great Plains Dexterity Development: FAQ Microsoft Business Solutions Great Plains is Dexterity-written application and currently we see increased interest for Great Plains customers to do in-house Dexterity development and customization. Dexterity itself is written on C programming language and its initial architecture was based on the Graphical and Database platform independence, which C programming language was believed to provide. Corel WordPerfect 7 Macro Programming Example Case study:A secretary using Corel WordPerfect 7 is often required to carry out calculations and insert the values into invoices or other documents.She does this by entering calculations into either a real calculator and transcribing the answer, or the Microsoft calculator and copying then pasting the result into the document. C++ Function Templates C++ Function templates are those functions which can handle different data types without separate code for each of them. For a similar operation on several kinds of data types, a programmer need not write different versions by overloading a function. |
home | site map |
© 2006 |