ai-blur-codes-577585 (1)

Creating a Private Hybrid Kubernetes Cluster Pt. 2

To pick up where I left off in the previous post, I had just finished attempting to create a cluster using Windows Server 1709 and 1607. In each of these I was able to get the cluster created, but there were still some issues that prevented the cluster from being usable with Windows workloads (e.g., Windows containers wouldn’t start, meaning that the cluster was no better than a Linux-only cluster).

Then, I acquired the most recent 1803 Windows Server. I went back and re-tried it using Rancher.

After a couple of minor hiccups (getting Docker installed and running was a bit more difficult than expected (Get Started: Prep Windows for containers, Build and Run Your First Docker Windows Server container), I saw a curious error when first attempting to join the cluster (the astute reader will recognize this from the first post).

True : The term 'True' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:254
+ ... 238b777fad5df203e2b1bf654c3b69479bdc48c9333df9fc388e; if(True){& c:/e ...
+                                                              ~~~~
    + CategoryInfo          : ObjectNotFound: (True:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

This looks very much like a bug in the generated deployment script:

 PowerShell -Sta -NoLogo -NonInteractive -Command "& {docker run --rm -v C:/:C:/host rancher/rancher-agent:v2.1.0-nanoserver-1803 -server https://[server-ip] -token [token] -caChecksum [checksum]; if($?){& c:/etc/rancher/run.ps1;}}"

My suspicion is that the highlighted portion above is being interpreted as a new command. This PowerShell syntax isn’t that clear to me, but it looks like the commands are strung together with “&” and “{ }“.

The error turned out to be a red herring. The real problem was that I was attempting to run the command in PowerShell rather than in a standard cmd.exe window. Once I executed the command in a standard cmd.exe window, it worked without error.

Then, I was able to query the nodes successfully:

$ kubectl get nodes        
 NAME              STATUS ROLES                  AGE VERSION
 lin-node-master   Ready controlplane,etcd,worker   21h v1.11.3
 vmw-lin-node      Ready worker                  21h v1.11.3
 win1803-node      Ready worker                  1h v1.11.3

On the Windows node I executed:

 c:> docker pull microsoft/windowsservercore:1803
 c:> docker tag microsoft/windowsservercore:1803 microsoft/windowsservercore:latest 

On a Kubernetes client I executed:

$ wget https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/WebServer.yaml -O win-webserver.yaml
 $ kubectl apply -f win-webserver.yaml
 $ watch kubectl get pods -o wide

Once the STATUS showed “Running” with an IP address, I was able to curl it. The IP in my case was 10.42.2.5 and so from one of the non-Windows nodes (this is a bug in Windows currently) I was able to curl it:

$ curl 10.42.2.5 
 <html><body><H1>Windows Container Web Server</H1><p>IP 10.42.2.5 callerCount 1 </body></html>

This shows that the Windows server is running as a Kubernetes worker node and run a pod.

SUCCESS

Some additional notes:

There are some places where my server names changed, which was due to a few times where I had server issues and had to rebuild nodes. Where possible, I tried to re-use nodes to reduce the time.

My installation of Windows Server 2016-1803 was a core-only installation able to run successfully with only 4GB allocated to it.