Recently Duncan Epping and I were asked via twitter how long it would take to deploy VMware Site Recovery Manager (SRM) to protect 300 VMs. I found myself struggling to get an answer to fit into 140 characters.
I’ve deployed SRM to many clients at this point and each client has their own challenges that make some SRM deployments more difficult than others. I’ve literally had SRM installed with array-based replication with test failover and fallbacks by lunchtime on Monday of the deployment week and sometimes it’s heads-down until late Friday to get it working with a mad dash to the airport.
There is a pretty reliable pattern of how SRM deployments transpire though. When I’m with a client, I usually draw a deployment challenges graph before we start building the environment that looks like this:
Notice how getting the (Storage Replication Adapters) SRA’s playing nicely isn’t as big a technical peak as certificates. The majority of the time, it’s a permissions issue on the storage array and not the SRA itself, however I have seen a few SRAs simply not work because of a multitude of reasons with firmware incompatibility being high on the list of reasons.
After being through a few now, I think that getting certificates from either the organization’s internal certificate authority (CA) or third-party CA’s installed and working correctly with SRM takes the prize for slowing down the deployment. Just getting the requirements correct to get the certificated requested in the proper format can be a challenge too.
Third-Party Certificates and SRM Installation
Whenever the client says they use their own internal CA or Third-Party CA, I know going into the engagement that some extra troubleshooting is most likely going to be a part of my week. I think it must be because we all understand the theory behind certificates, but the steps to get them created and installed is somewhat akin to going to the dentist.
Installing third-party certificates in the VMware environment is something most people agree is a good idea, but very few people actually do. How many times have you clicked “Ignore” when you log in with your vSphere Client? It’s probably so often that you do it as if it’s a normal part of the login process. User name, password, Ignore, and you’re logged into vCenter.
Until recently, the documentation for exactly how the certificates should be requested, the required fields and what goes into the commonName and subjectAltNames was really lacking and you had to go to 3 or 4 different source documents to cobble together what needed to be in the certificate. Unless you’ve been through a couple of installations with them, it presents more of a challenge than it should.
I did a project with Luke Huckaba @thephuck and from that he did a great job helping clarify what needs to be in the certificate for SRM server in this blog post. At a high level, just know that both vCenters and both SRM servers need to have certificates that are issued by the same CA and the SRM must have the subjectAltName field. The only time you learn that they don’t match is after you get both SRM servers installed and try to do site pairing and it fails. So make sure they are configured correctly or you get to re-install SRM once you get your certificates straightened out. There is a process for replacing them on existing SRM installations. But on a new deployment; let’s face it, reinstallation is easier.
Third-Party Certificates and vSphere Replication
If you enjoy learning shell commands and hacking virtual appliances, then using some third-party CAs or your Internal CA with vSphere Replication is for you. I completed an engagement where they used both array-based replication and vSphere Replication plus they had third-party certificates from an InCommon CA. I know that vSphere Replication is a 1.0 product, so it’s understandable if the documentation is a bit sparse and the management interfaces between the vSphere Replication Management Server (VRMS) and the vSphere Replication Server (VRS) appliances do not match when it comes to certificate requirements. You just upload a .pfx file in the VRMS while the VRS wants a key and a certificate.
But VMware, please keep this in mind this whole article is a feature request with SRM and vSphere Replication to make this whole process more straightforward.
After we successfully created and installed our certificates into the vCenters and SRM servers and had the SRM sites paired, we proceeded with the vSphere Replication installation.
Third-Party Certificates and vSphere Replication Management Server (VRMS)
We found a green field opportunity for documentation when it came to VRMS and third-party certificate installation and troubleshooting. We finally found one KB Article that said the certificate required the subjectAltName field, but no mention of what goes in the commonName field. The problem with that is the SRM server installation was very specific about what had to be in both the commonName and subjectAltName fields. Since the commonName and subjectAltName fields seem to be interchangeable in some circumstances, we created a certificate with no subjectAltName and used the DNS: fqdn of the appliance in the commonName field and it worked.
It also didn’t state what format the certificate needed to be generated in and given that vCenter required .pfx and SRM required .p12, that was a reasonable thing to have more readily documented. We did find in the web administration configuration screen for the vSphere Replication Management appliance a mention of .pfx in the upload field. So you either know this information already or you have to wait to request your certificate when you get to the point where you actually upload it to find out what format it is wanting. We created a .pfx certificate and uploaded it successfully.
But, our fun was just beginning. When we tried to complete our installation of the VRMS in vCenter, we got an untrusted certificate error. Now we knew that we had generated the certificate using the same CA as the two vCenters and SRM servers, but we were still getting a certificate error.
After looking through log files, general troubleshooting and googling, we discovered the java keystore that is on the VRMS didn’t have the InCommon CA as a trusted authority. That was actually what was throwing the error in vCenter, it was the VRMS not trusting the certificates. Armed with that information, we hacked our way through the steps of adding a trusted third-party CA into the VRMS java keystore. I am currently working with the team from the client to generate another blog post on the exact steps and commands on how to do this.
Once we had the InCommon CA trusted by both VRMS appliances (one at each site) and the certificates uploaded, we were able to connect the two site’s VRMS servers.
Third-Party Certificates and the vSphere Replication Server (VRS)
The VR servers had the same issue, but they don’t have a Java keystore. So we had to open a service request with VMware to get it working. The problem was the same as the VRMS, which was the VRS didn’t trust the InCommon CA. The CA had to be added to the /etc/ssl/certs directory on the VRS appliance. We also generated the certificate with only the commonName filed and not the subjectAltName and it worked.
How Long Does it Take to Install SRM?
As you can see from just this example of one subset of technical challenges, it can take some time to get SRM or any technical solution up and running if any of the technical components aren’t configured or ready to be configured.
Once SRM is installed, you are only partially complete with your disaster recovery deployment.
All of the VMs that you want to protect have to be put into Recovery plans and the applications have to be put in the correct start up and dependency order. This can take a long time and cross multiple departments of the organization to figure out. In fact, I often see clients so happy to have any kind of disaster recovery capability with their virtual machines they are satisfied to fail them all over and worry about start-up dependencies in the future.
So, how long does it take to install SRM? Of course the answer is, “It depends”.