Backups: Network planner’s gotcha
It’s the era of HD video, streaming audio, and all that megapixel hype. you can’t fight it. in fact, chances are good that you’re part of the problem, though perhaps unintentionally. Whom does the network planner have to thank for this circumstance? thank PowerPoint deck designers, Photoshop-savvy photographers, data warehouse warriors, the virtual machine vanguard, Autotuning artists, Web 2.0 marketing mavens, and fervent film fans. The appetite of these specialists, not to mention the steady move toward digital TV, ensures ongoing upgrades of network switches for years. But switch upgrades aren’t the only network planning chore to be considered. Unglamorous, uninteresting, often-overlooked backup tasks could place significant demands on network resources, or even place an enterprise at risk.
For those who perform capacity planning, nominal network usage means gauging network speeds, types, and volume of transfer under typical usage profiles. Such profiles interpret normal usage to mean service levels expected by applications availability when the users need them – such as during prime shifts, with at least some attention also paid to peak demand capacity.
The backup problem and its complement, data and service restoration, have been with computing from the beginning. when machines were less reliable, it was an ever-present concern; work was performed in small chunks with frequent restarts. Today greater reliability is taken for granted, but the stakes are higher; systems are increasingly interdependent, larger, process more voluminous information, and touch networks in complex ways that challenge the simple backup schemes trusted by many smaller organizations. Not only that, but we haven’t made it any easier for users. in fact, we’ve removed features that were once in place to allow users to manage at least some of their backup and restore needs.
Those old enough to remember DEC’s VMS, now called OpenVMS, experienced file level versioning. It was very convenient for users, who had control over when to purge versions and when to bring them back into service. It had a user-specifiable granularity that backup schemes usually neglect to implement. neither Linux nor Windows support it in 2010, so a spate of add-on hardware and software solutions have been added to the thirty-five year-old backup solutions most SMB’s employ today.
Several “fixes” have been proposed as backup techniques evolved. for well-financed enterprises prepared to absorb the overhead required for disaster recovery (DR) and backup, perhaps newer technology has proven satisfactory, though perhaps not convenient. for home and desktop users, the current silver bullet seems to be the ubiquitous USB drive. for modestly more complex topologies, network-enabled disk-to-disk schemes seem to have carried the day.
Neither are elegant or friendly to users and system administrators.
Volume Shadow Service (VSS)
The Windows VSS service operates at the block level. because of this, VSS allows for a read-only copy of a snapshot to be created, and thus avoids file locking. that is essential to creating a backup that is at least consistent with itself and that permits users to keep working while backups are running. (Restores are another matter). Better backup software makes use of VSS (watch the Windows event log for possible side effects).
Bright Idea Department: Windows Home Server
While no reader of the column may be willing to admit it, the Windows Home Server offers a flexible backup solution at a very low price point. Though it won’t work for most enterprise networks, it illustrates what a clever design can make possible. in fact, Home Server provides several hints for how a backup scheme should operate. It can grow flexibly, can back up onto commodity drives of different sizes and can deal with bare metal restores. (I have some reason to believe that it may not work as well in mixed Virtual Machine – non-VM environments).
Windows provides a means for recording the current state of the operating system. Periodic system state saves should be part of regular backups, though coordinating these with other bare metal restore tools can prove nontrivial. Also, snapshots of system state can sometimes slow down or even temporarily freeze up some tasks while the snapshots are being taken.
The Time Machine offers smaller environments a convenient and user-friendly way to access previous or deleted versions of files. The Time Machine was a vast improvement over previous Apple offerings, though its use in larger enterprise settings is difficult to assess.
If the budget is available to support one, consider using a SAN, such as Compellant or Dell EqualLogic. then create a separate network subsystem to offload the backup resources. this way, the main network resources used to support nominal traffic remains unaffected by backup and restore operations. Such processing can become somewhat involved. for example, consider IBM Tivoli Storage Manager recommendations for one such method. Large enterprises can also consider solutions such as Online Data Vault, InMage, and others.
Backup and restore workflow can be easily tripped up by minor obstacles, such as VM images, dumps, logs, database files, Windows update work area. Take just one of these – dumps. A dump of system memory on a machine with 8GB of RAM is likely to be bigger than a dump on a Windows 2000 workstation with 512MB.
Cisco is betting the farm on continued explosion of Internet speeds and volume. The company’s latest guess is for 767 exabytes by 2014, driven by, Cisco believes, video demand. Add VOIP and collaboration technologies to the mix and a network planner may be left rummaging through his toolbox for a better way.
One method worth mentioning, though it requires more cost and effort than some may find worthwhile, is simulation. Firms such as Opnet or Scalable offer ways to describe a network topology, the systems and services that need to be supported, and then to simulate network performance under various scenarios. see this Opnet-based student exercise to get a flavor of such an undertaking. Figure A is a screenshot from that exercise (click to enlarge).
1. Files deleted “a while ago”Apple’s Time Machine saves hourly backups for 24 hours, daily backups for a month, and weekly backups for everything older than a month.
2. RAID restoreThe bigger the physical drive, the longer it will take to rebuild the RAID from backup.
3. Restore events spanning work shiftsBackups can usually operate unmonitored, but restores may not have the luxury. those knowledgeable about applications being restored may be needed to help put restored data back into service, and in global or 24/7 operations, they may not be on a convenient work shift.
4. Oblivious applicationsThere’s a lot of talk about smarter apps, but many are still oblivious to backup and restore processing. some require taking down entire user communities, and others try to pass the buck to the database admin.
5. How “big” is a “big file”Outlook’s quaint classification of file sizes is a legacy due in part to the PST file system it sometimes uses, but when it refers to files “> 5MB” as “Enormous” it hints at how applications lag behind user needs to process larger files. The problem of “big files” can cascade rapidly through an organization. User disks become full. Network traffic increases as files are copied across network shares to local machines, sent via email, or combined with other files for aggregation.
6. Cloud backups (”Where’s the phone # for my ISP?”)As users of some consumer online backup services such as Carbonite have learned, one needs serious upload speeds to make offsite backup and restore feasible. Prepare to pay more to push backup packets to the cloud.
7. Virtual machine backups Virtual machine backup, restore, and propagation have created a new class of requirements.
You’re British Petroleum. everyone in the world wants to tap your undersea camera’s video stream. Bet you didn’t plan for that.
Popularity: 1% [?]