Search This Blog

Saturday, November 19, 2011

DAOS Backup and Restore

Table of Contents

This article describes how to refine your backup and restore procedures when using the popular Domino Attachment and Object Service (DAOS) feature, also known as "attachment consolidation."

Backup considerations for DAOS

In a standard Notes® database (NSF), the attachments are stored inside of the NSF file itself, and the database is self-contained. In order to back up a standard Notes database, only the NSF file itself needs to be backed up. After you introduce DAOS, the NSFs that participate in DAOS contain only references to the NLO files where the attachment content is stored. As a result, backing up the NSF alone is no longer enough. The NLO data needs to be backed up as well.

When performing standard NSF backups, there are two main approaches. The individual NSF being backed up can be taken offline (or the entire server can be stopped) so that no changes occur to the NSF over the duration of the backup operation. The other method allows the NSF and server to remain active, but requires using a backup utility program that interacts with the Domino® backup/restore API. Using a utility ensures that a consistent copy of the NSF contents is recorded, despite any changes that occur to the NSF over the duration of the backup operation. If you are using a backup utility that does not use the backup/restore API, you must stop the Domino server and all Domino-related applications for the duration of the backup.

None of the processing for NSF backups needs to change for DAOS. The change needed to accommodate DAOS is simply a procedural addition: in addition to backing up the NSF data, you must also back up the NLO data.

Backing up the NLO files in the DAOS repository can be done either while the Domino server is down, or when it is up and running. The backup does not require the use of any Domino API-based utilities. Once NLO files are written initially, Domino never modifies their contents, so the backup mechanism does not have to work around file-write activity. NLO files can be backed up as any other generic file on the file system. Only the NLO files that are complete and not in the process of being written to or renamed need to be backed up. Any that are busy can be skipped until the next backup. Most backup applications will automatically skip files that they can not read because of other activity.

Order matters

If you shut down the Domino server during the backup process, the NSF and NLO files can be backed up in any order. If you must keep the Domino server up and running during the backup process, it is important to back up all the NSF data before backing up the NLO files. The reason has to do with the addition of references to new NLO files in an active system, described in this section.

When you back up an NSF that participates in DAOS, there are some number of NLO references contained in that NSF at the time of the backup. Since there is some duration to the backup operation for all NSFs, the number of references to NLO files may be increasing over that duration in a system that is operating during the backup process. If there were (for example) 10,000 NLO files referenced collectively by all the NSFs at the beginning of the NSF backup process, there could be 10,100 by the time the last NSF is backed up.

Likewise, the backup of the NLO data has a duration as well, so while there might have been 10,100 NLO files at the beginning of the NLO backup process, there could be 10,200 by the time the last NLO is backed up.

In this scenario, the backed up version of the NSFs could reference at most only 10,100 NLO files. Because the NLO backup was done after the NSF backup process, the NLO backup included at least that many, but may have as many as 10,200 NLO files. Worst case, there are more NLO files backed up than strictly necessary to satisfy the NSF references. Since all accesses to the NLO files are done through the NSFs, and the NSFs were done first, all of the referenced NLO files are guaranteed to exist in the set of NLO files that were backed up. If there is an error accessing an NLO file in order to back it up because it's in use, that can safely be ignored. If the file is being written, the activity must have occurred after the NSF was backed up; therefore, this NLO file does not need to be in the corresponding set of NLO files, and will be backed up as part of the next cycle.

The deferred deletion interval should be set to a period longer than your chosen backup cycle. In this way, the NLOs are not pruned (physically deleted) prior to the next backup. Instead, the actual deletion is deferred until they've aged accordingly.

If you were to have a shorter or nonexistent deletion interval—the feature can be disabled by setting it to zero in the DAOS tab of the server document—it opens a window of time during which a deleted attachment is non-recoverable, as the NLO file has been physically deleted before the backup has occurred.
Avoid pruning NLO files from the repository (by issuing a prune command at the Domino console) before they have had a chance to be backed up; you will prevent them from being recoverable. When an attachment is deleted, and the associated NLO file's reference count goes to zero, it becomes a candidate for deletion. The deferred deletion interval determines when the deletion actually occurs. If the deferred deletion interval is set (as recommended) to be longer than the backup cycle, all NLOs will be in existence for at least one backup cycle, and therefore any NLO can be recovered later.

After the initial full backup of the NLO files in the DAOS repository, you can perform incremental backups, which save only the data that has changed since the last backup. NLO files are ideal candidates for incremental backup because there are no changes to them after their initial creation.

One NLO file is created for each unique attachment, so it is possible to have a very large number of NLO files in large deployments. The maximum number of files per numbered DAOS subdirectory is 40,000, and there can be 1000 subdirectories, for a maximum total of 40 million NLO files. Check with your backup utility specifications to see if there is a limit on the total number of files it will manage, and monitor the growth of the DAOS directory file population accordingly.

DAOS index files

The daos.cfg and daoscat.nsf files should not be backed up. (Note that this is a change from earlier recommendations) These two index files can be re-created from the DAOS repository and the NSFs participating in DAOS if necessary. If these files become corrupted, they can be safely deleted while Domino is not running. They will be created on startup automatically.

The daos.cfg file helps manage the files in the DAOS repository. The NLO files are stored in subdirectories (0001, 0002, and so forth) underneath the base DAOS directory. For several reasons (including performance), DAOS limits the number of NLO files in each subdirectory. The daos.cfg file keeps track of how many files are currently in each subdirectory so that DAOS puts new files in subdirectories where the count of files is below that limit. As NLOs are deleted, the corresponding file count is decremented, allowing backfilling of older subdirectories. The daos.cfg file is expendable, and will be re-created at Domino startup time if it is missing.

The daoscat.nsf file contains two indexes. One is a list of all NSFs that are holding NLO references (DAOS ID Table, or DIT.) The second is a list of all NLOs that exist, and the DAOS repository subdirectory they exist in (DAOS Object Index, or DOI). There are no externally visible parts to this NSF, and there are no privileges that apply to change that. The DIT is modified when an NSF acquires its first NLO reference. The DOI is modified when a new (unique) NLO is created. The daoscat.nsf file is expendable, and will be re-created at Domino startup time if it is missing. Since a full resync can take a significant amount of time, only empty indexes are created by this process at startup. A resync operation should be done as soon as it is convenient, however.

In some cases it could be necessary to fully reboot the server until the daoscat.nsf and the daos.cfg are re-created.

A DAOS resync operation (“tell daosmgr resync”) fully re-populates these two indexes from scratch. You can also run the command “[n]daosmgr resync”if you want to perform a resync operation with the Domino server shut down.

Transaction logging

Because all NSFs that participate in DAOS have to also participate in transaction logging, the contents of all their attachments will be included in the log. Any NLO files that are created as a result of activity to the NSF will be re-created if the log is replayed.

Command examples

Using the Tivoli Storage Manager (TSM), the command to back up the DAOS repository would be:

dsmc incremental c:\lotus\domino\data\daos

where the path specified is the full one to the DAOS repository.

Since the NLO files are being backed up incrementally, the initial backup will be quite large, but subsequent ones will be much smaller. The total footprint of the DAOS directory will be written out during the first backup.

DAOS enable and disable considerations

Once a Domino server has DAOS enabled, and NSFs are selected to participate in DAOS, their attachments are stored in the DAOS repository. If DAOS is subsequently disabled, the attachments that were in DAOS remain in DAOS until they are re-integrated into the NSF. Any DAOS references that remain in the NSF will continue to be serviced by DAOS, even if it is disabled. An NSF that contains DAOS references is not self-contained, and must continue to be treated as an active DAOS participant as long as it has DAOS references. To re-integrate the DAOS attachments into an NSF and remove the DAOS references, you can process the NSF with the “compact -c -daos off” command. Once that is done, the NSF will be self-contained again, and can be treated as a normal NSF.

Furthermore, to ensure that the DAOS enablement change takes effect completely, the Domino server as well as all processes that use the Domino API (compact, resync, backup, etc) are stopped. This allows the API to terminate completely, so the status change can be picked up at the next startup.

Space and time savings

The disk footprint savings with DAOS continues into the backup processing as well. The NLO files represent the static data that used to reside in the NSF, and was backed up every cycle even though it hadn't changed. In a typical mail environment, a large reduction in the NSF footprint plus a very small amount of NLO data means the reduction in the NSF footprint translates almost directly into a reduction in the backup footprint. Not only is the duplicate data being eliminated, the mailfile data is being separated into static and dynamic components. By applying an incremental backup regimen to the static (NLO) data, only the NLO files created since the last backup cycle need to be processed. That represents typically a very small amount of data compared with the entire set of NLO files.

Using DAOS enables backup software solutions to optimally backup the NLO files. This is because once an NLO is written to disk, it never changes. Therefore, the file need only be backed up once in its lifetime. Based on the example shown in the DAOS Estimator document [link], the space saving per full backup would be 38.8 GB, roughly equal to the number of shared NLO's times the average NLO size.

In the incremental backup case, duplicate NLO's will not be backup up again. Thus, the space savings from DAOS is directly proportional to the numbe r of duplicate NLO's seen in the environment, and the backup time savings is the product of the space saved and the backup throughput.

Restoring DAOS objects

A well-preserved DAOS repository makes for fast and easy restoration. And, while not really a backup mechanism, the default deferred deletion interval allows for accidentally deleted attachments to be saved from physical deletion up to 30 days after a mishap. Simply pull the document out of the trash folder, if soft deletion is enabled. If it's too late for that, restore from backup, then resynchronize the DAOS catalog using the server console command "tell daosmgr resync force" -- DAOS will once again recognize that the NLO has references.

Although an NLO will survive for the period of time specified by the deferred deletion interval, if soft deletion is disabled or a backup of the referencing document has not been made, there is no way to get at the contents of the NLO, especially if encryption is enabled (the default).

Restoring documents or NSF files with DAOS attachments

To restore either a full NSF or a single document, the process starts off the same. You must first restore the database and then the missing NLOs. To do this using Tivoli Data Protection for Domino, you would issue the command:

domdsmc restore -into

To determine which missing NLO files to bring back from the Domino server console, run

tell daosmgr listnlo -o missing.txt MISSING restoreddatabasename.nsf

The resulting missing.txt file is then fed into the restore command With the Tivoli Storage Manager (TSM), the command would be

dsmc restore -filelist=missing.txt -inact 

If you are restoring the entire NSF, you are done. Note that any restoration operation will put the DAOS catalog into the Needs Resync state, so a resync operation should be performed as soon as convenient.

If you need only one document, you can now copy and paste it to its intended destination.
For a complete recovery after a catastrophic failure, the NSF and NLO files can be restored, followed by replaying the archived transaction logs. This will result in the most up-to-date recovery situation.

Dealing with damaged files and clusters

If an NSF is damaged, and you have clustered servers or replicas of the NSF on another server, you have several options.
  1. Replicate each entire NSF – New replicas can be created from the existing NSFs on the clustermate(s). Each new replica should be marked as DAOS enabled. As the replication occurs, the associated attachments will be saved to DAOS.
  2. Copy NSFs, replicate missing attachments – All of the necessary NSFs are copied from the clustermate(s) to the server being repaired. This will create a copy of the document data without attachment data. Fixup -j -D is then run, deleting all documents that contain DAOS references to NLO files that do not exist. Subsequent replication will re-create those documents along with the associated attachments, which will be stored in DAOS.
  3. Copy NSFs, copy/restore NLOs - All of the necessary NSFs are copied from the clustermate(s) to the server being restored. The command 'tell daosmgr listnlo missing somefile.nsf' is then issued for each individual NSF to generate a list of the NLO files that do not exist in the DAOS repository. Those NLO files are then restored from backup, or copied from the clustermate(s). (Note that copying the NLO files from another Domino server will work only if DAOS encryption is turned off. DAOS encryption is on by default, and uses the server key to do the encryption; therefore enc rypted NLO files are not portable between Domino servers.)
  4. Copy full NSFs and re-extract – If you have a replica on another server that is not DAOS-enabled, the NSF can be copied to the server being restored. The attachments will be inline in that copy of the NSF, and 'compact -c -daos on' can be issued to extract the inline attachments out to DAOS.
  5. Reintegrate NSFs and re-extract – If you have a replica on another server that is DAOS-enabled, but encryption prevents using the NLO files directly, you can run a 'compact -c -daos off' on the other server to re-integrate the attachments into the NSF. Once that is done, the NSF can be copied to the server being restored, and you can use 'compact -c -daos on' to extract the attachments to DAOS again.

Options for restoration

The need for restoring NLO files depends partly on the deferred deletion interval. If the restore is happening from a snapshot that's within the interval (for example, the interval is 30 days, and the NSF is being restored from last week's backup) it's not possible for any of the NLOs to have been deleted yet, so there shouldn't be a need to restore any NLOs. If the NSF being restored is older than the interval (for example, the interval is 30 days, but the NSF being restored is from 3 months ago), it's possible that some of the NLOs have been deleted, and would need to be restored.

Some of this also depends on what the reason for the restore is. If it's a catastrophic failure, you should restore the NSF(s) and run 'tell daosmgr listnlo missing filename.nsf' to get a list of all of the NLOs needed to make it whole again. That list should then be fed into the restore utility to restore those NLOs as well.

If it's just a matter of getting a single document with attachments back, you don't really need everything to be made whole to access just that one set of attachments. In the spirit of “Work smarter not harder,” you can restore the NSF, and then attempt to access the desired document (and attachments, if any) and finally deal with any missing NLOs that are mentioned during that operation. If there aren't any attachments on the document, there's no other work to be done. If there are attachments, the NLOs may still be there, so it's worth trying to access them before doing anything else. If any are missing, you'll get a console message that mentions the name of the NLO, at which point you can restore only what you need.

Offline archival

For offline archival purposes, it is recommended that the attachments be re-integrated into the NSF using a 'compact -c -daos off' operation prior to archiving. That eliminates the need to archive all the individual NLO files referred to by the NSF also. For example, if an employee was leaving the company, and their mail account was being closed and archived, this approach would be appropriate.

No comments:

Post a Comment