Jump to content

Slower than expected dedicated server start times. (~20 min / instance)


Nementh

Recommended Posts

Hi

TLDR; not sure what can be causing slow startup server instance times for instances on a virtual machine.

I'm running a dedicated server for a small group of friends. It's on a vmware backend, and I've set it up with 8 procs (2.9Ghz procs), 32 gb of memory, it's running on an iscsi san (freenas with 32 gb of memory), and the 36 sdds are in a raid 10. It happens on all esxi hosts, and I see no hardware bottlenecks when a server instance is told to start. Iscsi traffic for all vm's at any given time when an ARK server instance is starting is < 25 % of it's throughput. The pending read / writes times are < 1 ms. Processor utilization during server start, and while running never exceeds 20% utilization. 

I use ASM, and I have 4 instances in a cluster, on a windows 2k19 server. Each instance has it's own install directory, they only share the cluster folder files. If I kill all servers and start one at a time, the start time is between 18 and 22 min / server. If I start all 4 instances at the same time, the start time for all instances remains at 18-22 min for all 4. If I create a new instance, no mods, start time is still the 18-22 min for the island. The server save size for each instance is < 50 mb each. 

I've not attempted to start an instance from cli bypassing ASM, but I suspect the same results. 

Any thoughts on how to increase start time performance?

Link to comment
Share on other sites

What's your physical CPU and RAM?

It's a long time since I ran from a HD so I can't give a comparison, but starting a populated Island map with no mods takes about 3 or 4 minutes from an M2 drive.
18-22 minutes sounds like you're running Primitive Plus - PP always took ages to start for me.

There may be an argument for using a VM in a commercial server-farm, but I avoid VMs for my Ark server.  They add a small overhead since you're running an OS on top of an OS, but it shouldn't be more than about 10% slower than a bare-metal start.

If you're not running any mods, you can try    -structurememopts    on your server.  The second and subsequent starts should be faster than the first start.

The following Engine.ini is more of a run-time benefit than a startup-time benefit, but maybe worth a try.

[/script/onlinesubsystemutils.ipnetdriver]
NetServerMaxTickRate=15


The only other thing I might suggest is to post your command line (removing any passwords) in case anyone can spot anything obvious.

Link to comment
Share on other sites

Thanks for the reply!

Procs on the host: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
memory on the host: DIMM DDR3     16gb     1333 MHz

on enterprise level ssd's, on a 36 disk raid 10, perf is higher than my nvme drive in my desktop with the raid controllers cache, and freenas's 32 gb ram drive for frequently used data.

Not running anything overly crazy as far as mods, even without mods, the start time for the instance is still 18-20 min.

Maybe Battle eye?


Setting breakpad minidump AppID = 346110
[2020.09.29-14.11.13:435][  0]LogMemory: Platform Memory Stats for WindowsServer
[2020.09.29-14.11.13:435][  0]LogMemory: Process Physical Memory: 61.94 MB used, 61.95 MB peak
[2020.09.29-14.11.13:436][  0]LogMemory: Process Virtual Memory: 51.25 MB used, 51.38 MB peak
[2020.09.29-14.11.13:437][  0]LogMemory: Physical Memory: 13684.69 MB used, 32767.42 MB total
[2020.09.29-14.11.13:437][  0]LogMemory: Virtual Memory: 4441.33 MB used, 134217728.00 MB total
[2020.09.29-14.11.15:524][  0]ARK Version: 313.52
[2020.09.29-14.11.15:525][  0]PID: 6564
[2020.09.29-14.23.48:846][  0]Primal Game Data Took 749.44 seconds
[2020.09.29-14.32.12:147][  0]SteamSocketsOpenSource: gethostname failed ()
[2020.09.29-14.32.12:147][  0]gethostbyname failed ()
[2020.09.29-14.34.11:087][  0]Server Initializing with BattlEye Anti-Cheat Protection. If you do not wish to use BattlEye, please launch with -NoBattlEye
[2020.09.29-14.34.11:175][  0]BattlEye successfully started.
[2020.09.29-14.34.11:179][  0]bEnableMeshBitingProtection is True
[2020.09.29-14.34.51:348][  0]Server: "<servername>" has successfully started!
[2020.09.29-14.35.06:926][  0]Commandline: TheIsland?listen?Port=7777?QueryPort=27015?MaxPlayers=70?ServerAutoForceRespawnWildDinosInterval=259200?AllowCrateSpawnsOnTopOfStructures=True -ForceAllowCaveFlyers -AutoDestroyStructures -EnableIdlePlayerKick -clusterid=ArkCluster01 -UseBattlEye -servergamelog -servergamelogincludetribelogs -ServerRCONOutputTribeLogs -useallavailablecores -usecache -nosteamclient -game -server -log
[2020.09.29-14.35.06:927][  0]Full Startup: 1442.49 seconds (BP compile: 0.00 seconds)
[2020.09.29-14.35.06:929][  0]Number of cores 10

Link to comment
Share on other sites

I did remove BattleEye from my cluster, but not for performance reasons.
There was (is ?) a bug for some people where BattleEye insists on installing itself and forcing Ark to restart.  Then repeats, so it's impossible for some people to get on to BattleEye servers.

Memory on your host could be an issue with 4 maps running.  My busiest maps hits 8GB RAM at times and the quiet ones (with no users online) use about 4GB.
I wonder if you're hitting virtual memory a lot - page faults would be expensive.
 

image.png.e14e75dc957d2558a66fe6891dbb5648.png



image.thumb.png.978b17717d80ebf5c0f46daf1b5afea3.png

 

Link to comment
Share on other sites

Interesting, my i/o reads and writes are much higher. Are you on local storage ? I have no hard set requirement for battle eye, I only have 10-25 people on at any given time, and I know the majority of them in person. Might turn it off and see what metrics look like otherwise.

Not getting close to memory limit afaik, 

image.thumb.png.413bb857baca652fa23c2666c92392dd.png

image.png.159b2d3daaaf9b94c25509ca7a7f8e78.png

image.thumb.png.8f419141146dd039b8f87b9b311ff54c.png

 

image.thumb.png.86c37188e262a52f47d2b5b733997bd3.png

Link to comment
Share on other sites

I reboot my server daily, so that might explain your higher i/o read/write counts.
Yes, my storage is local.  The Ark maps save to the C drive, which is a 1TB Toshiba M2.

In your OP, you say your VM has 32 Gig allocated.  But in another post you say your physical machine has 16GB RAM?  Or have I misunderstood?

My busiest map atm is Ragnarok, which takes about 5 mins to startup.image.png.7c371cf3e3ff15d7465312a25fcfcf0b.png

 

Link to comment
Share on other sites

I bounced the server last night and finished some windows updates, but don't regularly restart the guest vm, might give regular restarts of the guest OS a go and see what happens.

Sorry for the ram confusion, I started @ 16 gb, but added another 16 to the vm, the esxi host has 264 gb total, seeing your stats is actually really helpful, the location of the data shouldn't really change the i/o writes or reads, but mods or tasks the game instance is doing would I would imagine. I also disabled battleye with no effect.

Will check the i/o after a few more hours see if it's more in line. I wouldn't expect the disk i/o to make a ton of difference, but I might kill the paging file and force it to live in the ram and see if there are any differences there, maybe it's balancing tasks in a weird way.

Link to comment
Share on other sites

I wouldn't recommend killing the paging file, and I didn't realise how much RAM you had 🙂

Some mods do use their own config files - Awesome Teleporters and Soul Traps are the ones I know of.  But their i/o is minimal.
One shop mod (can't remember which) communicates with a central server somewhere, so will hit the network.

It might be worth googling the mods you have to see if any are know to cause startup delays.

Another thought, but what happens if you start a brand new map without any mods?  Does it still require a long time to start?
And what happens if you start a map directly on your server (not in a VM)?  Is it still slow?

Another thought (assuming you're on Windows)  Open a DOS prompt as administrator and run     sfc /scannow
If it reports errors that it can't fix, then       Dism /Online /Cleanup-Image /RestoreHealth

Do that in your VM and directly on your server.

Link to comment
Share on other sites

I killed the paging file as I'm over provisioning ram, and I really want the VM to use ram over hdd space, my read / write i/o dropped more than half. There are also some inconsistent loading speeds, namely < 400 second load times. There were several variables that changed however so I cannot point it to one in specific cause, but I will tinker with it and see if there's one that I can isolate. I loaded into Ragnarok and found that there were some 400 ice wyverns on the map, doing a dino wipe got rid of them, but there were several that were overspawning, I've adjusted the spawn rates, and wonder if that was part of it, the # of entities in stasis.

Setting breakpad minidump AppID = 346110
[2020.10.03-18.10.09:829][  0]LogMemory: Platform Memory Stats for WindowsServer
[2020.10.03-18.10.09:830][  0]LogMemory: Process Physical Memory: 61.86 MB used, 61.87 MB peak
[2020.10.03-18.10.09:831][  0]LogMemory: Process Virtual Memory: 51.08 MB used, 51.21 MB peak
[2020.10.03-18.10.09:831][  0]LogMemory: Physical Memory: 9934.94 MB used, 32767.42 MB total
[2020.10.03-18.10.09:832][  0]LogMemory: Virtual Memory: 4441.29 MB used, 134217728.00 MB total
[2020.10.03-18.10.10:545][  0]ARK Version: 313.57
[2020.10.03-18.10.10:546][  0]PID: 2196
[2020.10.03-18.11.18:766][  0]Primal Game Data Took 64.93 seconds
[2020.10.03-18.14.09:471][  0]SteamSocketsOpenSource: gethostname failed ()
[2020.10.03-18.14.09:471][  0]gethostbyname failed ()
[2020.10.03-18.14.47:956][  0]bEnableMeshBitingProtection is True
[2020.10.03-18.15.23:282][  0]Server: "<server name>" has successfully started!
[2020.10.03-18.16.02:117][  0]Commandline: Ragnarok?listen?Port=7781?QueryPort=27017?MaxPlayers=70?ServerAutoForceRespawnWildDinosInterval=259200?AllowCrateSpawnsOnTopOfStructures=True -ForceAllowCaveFlyers -AutoDestroyStructures -EnableIdlePlayerKick -clusterid=ArkCluster01 -NoBattlEye -servergamelog -servergamelogincludetribelogs -ServerRCONOutputTribeLogs -useallavailablecores -usecache -nosteamclient -game -server -log
[2020.10.03-18.16.02:118][  0]Full Startup: 356.50 seconds (BP compile: 0.00 seconds)
[2020.10.03-18.16.02:120][  0]Number of cores 10

Link to comment
Share on other sites

On 10/3/2020 at 2:18 PM, Larkfields said:

Classic flyers causes an over spawn of ice wyverns on Rag 😞
The author of the mod is being a bit "difficult" about it and blaming WildCard.

I tried editing the config to reduce the over spawn but couldn't get anything to work.

 

I removed them all together, I also tried to reduce their spawning at first:
 

DinoSpawnWeightMultipliers=(DinoNameTag="Ice Wyvern",OverrideSpawnLimitPercentage=False,SpawnLimitPercentage=0.000000,SpawnWeightMultiplier=1.000000)
NPCReplacements=(FromClassName="Ragnarok_Wyvern_Override_Ice_C",ToClassName="")


I took a break from trying to fix the issue, and it seems like rebooting the guest OS may have been the thing that sped up loading, I'm back to 1200 - 1600 seconds to load any map, will reboot again soon to confirm.

Link to comment
Share on other sites

  • 2 weeks later...

 I have been running a cluster of servers for the last year or so, when I first started the servers my startup times averaged around 600-900 seconds. A few months ago, they started creeping up to 1500-2100 seconds and now in the last couple of days have increased to 4800-5200 seconds. I did recently add a couple of mods and a new API plugin but that was about 3 weeks ago. I also have a couple of crossplay servers with no mods or plugins and the startup times are the same on those maps as well so it doesn't seem to be mod or plugin related. Also with this I have been getting a threadwatcher hang crash message when starting the server simultaneously, the servers aren't actually crashing as I can still login and play while the message is open but as soon as I click ok it will close the server. 

I have 10 maps total hosted on a VM running on a Dell R720 with dual E5-2689 processors and 128gb of RAM. I am running proxmox for the hypervisor and have 64gb of RAM and 24 cores dedicated to the Ark Server  VM. 

I have updated all of my VirtIO drivers, and run the SFC command in Windows which was able to successfully fixed the errors it found. The process and ram usage is no where near max nor is the disk usage so I am not sure what the issue is. I am to the point of creating a new VM and moving my save files over but I would like to understand what happened so I can repair it or prevent it in the future. So if anyone has any suggestions on things to check or try I would appreciate any input.

Link to comment
Share on other sites

1 hour ago, starweaver said:

same hardware and everything else on both systems
10 mods ~1GB
default settings minus name of server and 2x rates
windows start time - 20-30 minutes with cache enabled
debian start time - 50-70 seconds
can some one explain why windows is so crap when it comes to start times?

Which mods ?

That's not a normal startup time.  My slowest modded server took about 6 minutes to start on Windoze.

Link to comment
Share on other sites

2 minutes ago, Larkfields said:

Which mods ?

That's not a normal startup time.  My slowest modded server took about 6 minutes to start on Windoze.

s+
castle keep forts legacy
utils +
stacks evolved
best eggs
custom dino levels - because the vanilla level spread just sucks
genesis grinder - only on genesis
armour stand
fixed perm titans
simple spawners

server is running on a 960 evo 250G

Link to comment
Share on other sites

Just posting a reference here. 

We're hosting on a Dell PowerEdge R710. It's got two X5680's and 64GB RAM. Servers run off of a pair of Evo 860's in RAID-0. OS is Windows 10 Pro. We have 8 maps in a cluster, and each map has it's own pair of CPU cores.

Average startup time for the servers is 319 seconds.

The mods we run are as follows.

Death Recovery Mod
Super Structures
Set Eye Height
Super Spyglass
Cross-Genesis 1
Kraken's Better Dinos
DinoTracker

I have also modified player and dino levels, but that wouldn't have much impact on load times if any.

 

Link to comment
Share on other sites

I haven't been hosting for nearly as long, but I'm having the same issue on a HP DL380 G8 host, with VMware as my hypervisor, 2.26 ghz procs, and 256 gb of ram. windows 2019 guest, with 10 cores, and 64 gb of ram, 5 instances, with or without mods all take 1600 + seconds to load. I can migrate between 3 of my hosts, and the guest / server start time is the same on all nodes.

Storage is attacked through mutli channel iscsi on a san running freenas for me, isyour storage local?

I haven't found a real fix since I started looking into the issue a few months ago, sorry it's not helpful other than maybe you're not alone.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...