Playing with Scaleway Kosmos — Part #4

webofmars
5 min readNov 3, 2021
Photo by Ryan Hutton on Unsplash

NOTE: This article is part of a serie dedicated on Scaleway Kosmos. If you haven’t read from the beggining you can should start here:

NB: Sources can be grabbed from github.com/webofmars/labs-kosmos/

In the firsts episodes we created a multi-cloud Kosmos cluster on Scaleway cloud-provider, played with multi-cloud dameonSets and LoadBalancer, explore ways to run your Stateful apps, it’s now time to go in the networking land aka the CNI.

As you might have anticipated, scaleway is deploying a special CNI in order to answer all the multi cloud chalenges, to name a few performances, security, low footprint and universality.

The CNI used here is name “kilo” : https://github.com/squat/kilo

The main argument for kilo is to use a node to node encryption based on wireguard a battle tested light VPN service. In this approach kilo can be compared to RancherLabs submariner project.

The main questions i had in mind when testing Kosmos were: “Is it stable enough ?” and “What the performance looks like ?”

For investigating the CNI i used the excellent tool Kubernetes Network Benchmark aka knb published by Alexis Ducastel and since we know each other i collaborated with him to interpret the results.

I can only suggest you to run his tool on your clusters to have at least a rough idea on how your CNI behave since that can change a lot from one cluster to the other (check our results bellow if you doubt about it).

What kind of bandwidth I can get using kilo ?

Using knb with all the nodes we can fill the following spreadsheet:

kilo bandwidth tested with knb

We can roughly see that the bandwidth can reach 1200 MB/s on certain cases but it seems that it does impact (as expected) the maximum bw we can reach.

I will left aside the cases with a red border for now since there is something special that i want to emphasis later.

These results are not so useful if we don’t know what the reference is, so in parallel we conducted some benchs that bypass completely kubernetes and the CNI using iperf directly on the nodes

Raw bandwidth in Mbps

As we can see this is quite different but there is some suspect values with red borders here. Still we can conclude that kilo CNI is working correctly when you are doing cross cloud traffic but is very limiting in the case you have local cloud provider traffic (hence CP1 to CP1 nodes and so on). This is because in such a case there is some network optimizations and kilo is probably not able to handle such a traffic without giving it more power.

What is the resources consumption when i use kilo ?

From the data that kilo generated we can extract CPU used percentage and RAM used in MB.

kilo CNI cpu usage in percent
kilo CNI ram usage in Megabytes

Looking at the resources used during the network bench show us different things:

  • The CPU and RAM can be quite used as expected for a CNI that does on the fly encryption on each packet exiting the node
  • In the meantime the nodes are not overloaded by it, so the CNI is exploiting the resources correctly and bench results can be marked as reliable (more on this later…)

So … WTF ?

Photo by Luku Muffin on Unsplash

We were quite surprised by some of the values we get running the tests (and all of the tests were conducted multiples times in order to average all values to something realistic) and we stated at the start that was a limitation on OVH cloud provider. But thinking about it it’s not !

That’s clearly show how hard it is to run network benchmarks of any type on cloud providers. In fact there is at least 2 limitations:

  • The network bandwidth is often limited depending the instance type you used (the cheapest, the less BW available). That was the case for OVH, the D2–8 nodes are limited officially to 100 Mbps (and we can conclude they don’t enforce it in realtime)
  • The cpu power are not 100% portable a 2 vCPU on CP1 is not the same as 2 vCPUs on CP2 (founder / frequency / instructions etc …)

Once this understood we ran an additional test with I1–180 instances on OVH officially limited to 8Gbps and got better results: around 1300 Mbps on OVH to OVH traffic and quite honorable scores on cross cloud traffic.

So don’t do KnB on cloud instance with this in mind: “cloud” is not a RFC it’s an implementation ;-)

Conclusion

Scaleway Kosmos product uses kilo as it’s default CNI (and you can’t change it easily) which shows a good balance between performances / resources consumption / and features.

Of course it can’t be compared to the results Alexis Ducastel got when running bench on 10 Gbps local network, but it’s also the only Multi-Cloud CNI (exception of sub-mariner that i would like to test also).

Anyway the pure bandwidth between nodes is usually not what you target when you decide to run multi-cloud application. But security and reliability seems to be present and this was what we expected

To be continued

That is for the CNI in next and last article of the serie we will conclude and take s step further to see further than Kosmos.

Photo by Olesya Grichina on Unsplash

--

--

webofmars

DevOps coach & specialist. ☸️ CKA | ☁️ AWS solutions architect. Containers enthusiast | 🐶 Datadog Partner. #French, #Geek, #Dad, #Curious