Commit Graph

101 Commits (7ba3b88fb66de8c22ae1e067224b40214c23957c)

Author SHA1 Message Date
Justin Clark-Casey (justincc) 4e03d352c3 Extend drop command to "debug lludp drop <in|out>..." to allow drop of inbound packets.
For test/debug purposes.
2014-08-19 18:43:21 +01:00
Justin Clark-Casey (justincc) 298376d5c7 Add "debug lludp drop out <add|remove> <packet-name>" console command for debug/test purposes.
This drops all outbound packets that match a given packet name.
Can currently only be applied to all connections in a scene.
2014-08-19 18:34:17 +01:00
Justin Clark-Casey (justincc) 84cea46c10 Add experimental OutgoingQueueRefillEngine to handle queue refill processing on a controlled number of threads rather than the threadpool.
Disabled by default.  Currently can only be enabled with console "debug lludp oqre start" command, though this can be started and stopped whilst simulator is running.
When a connection requires packet queue refill processing (used to populate queues with entity updates, entity prop updates and image queue updates), this is done via Threadpool requests.
However, with a very high number of connections (e.g. 100 root + 300 child) a very large number of simultaneous requests may be causing performance issues.
This commit adds an experimental engine for processing these requests from a queue with a persistent thread instead.
Unlike inbound processing, there are no network requests in this processing that might hold the thread up for a long time.
Early implementation - currently only one thread which may (or may not) get overloaded with requests.  Added for testing purposes.
2014-08-19 00:17:12 +01:00
Justin Clark-Casey (justincc) b375f86f11 Make LLUDPServer.Scene publicly gettable/privately settable instead of protected so that other logging code in the clientstack can record more useful information
Adds some commented out logging for use again in the future.
No functional change.
2014-08-19 00:17:12 +01:00
Justin Clark-Casey (justincc) e0c6bfa81e If a user moves back in sight of a child region before the agent has been closed on teleport, don't unnecessarily resend all avatar and object data about that region. 2014-08-15 21:47:34 +01:00
Justin Clark-Casey (justincc) 91e1aaa5d4 On teleport to a region that already has a child agent established (e.g. a neighbour) don't resend all the initial avatar and object data again.
This is unnecessary since it has been received (and data continues to be received) in the existing child connection.
2014-08-15 21:47:34 +01:00
Justin Clark-Casey (justincc) 0db6f3a2bd Only set up the UnackedMethod for an outgoing message if that message is actually meant to get an ack (because it's reliable). 2014-08-13 22:57:14 +01:00
Oren Hurvitz 528704bc04 Added "debug packet --all" option, which changes the packet logging level for both current and future clients
The existing "--default" option only changes the logging level for future clients.
2014-07-21 08:31:20 +01:00
Oren Hurvitz a57b4b81b9 Fixed the logic that decides if a packet was queued (it was reversed) 2014-07-21 08:31:09 +01:00
Justin Clark-Casey (justincc) 8e1bf55e7b Add IncomingPacketsResentCount clientstack statistics
This records how many packets were indicated to be resends by clients
Not 100% reliable since clients can lie about resends, but usually would indicate if clients are not receiving UDP acks at all or in a manner they consider timely.
2013-11-06 01:02:20 +00:00
Justin Clark-Casey (justincc) 4c4a1cf715 Start counting resent packets in the places that I missed when the stat was first added a few commits ago 2013-10-31 23:59:22 +00:00
Justin Clark-Casey (justincc) 3d5a7e9b19 Add OutgoingPacketsResentCount clientstack stat.
This allows one to monitor the total number of messages resent to clients over time.
A constantly increasing stat may indicate a general server network or overloading issue if a fairly high proportion of packets sent
A smaller constantly increasing stat may indicate a problem with a particular client-server connection, would need to check "show queues" in this case.
2013-10-31 23:45:52 +00:00
Justin Clark-Casey (justincc) cccdfcb59e Comment out LLUDPServer.BroadcastPacket() to reduce code complexity. Appears to be a never used method. 2013-10-24 00:37:49 +01:00
Justin Clark-Casey (justincc) 5d61c4039d Only set the data present event if we actually queued an outoing packet (not if we sent immediately) 2013-10-24 00:33:14 +01:00
Justin Clark-Casey (justincc) 2cd95fac73 refactor: Rename Scene.AddNewClient() to AddNewAgent() to make it obvious in the code that this is symmetric with CloseAgent() 2013-09-27 22:27:39 +01:00
Justin Clark-Casey (justincc) b16bc7b01c refactor: rename Scene.IncomingCloseAgent() to CloseAgent() in order to make it clear that all non-clientstack callers should be using this rather than RemoveClient() in order to step through the ScenePresence state machine properly.
Adds IScene.CloseAgent() to replace RemoveClient()
2013-09-27 19:14:21 +01:00
Justin Clark-Casey (justincc) 32ddfc2740 Reinsert client.SceneAgent checks into LLUDPServer.HandleCompleteMovementIntoRegion() to fix race condition regression in commit 7dbc93c (Wed Sep 18 21:41:51 2013 +0100)
This check is necessary to close a race condition where the CompleteAgentMovement processing could proceed when the UseCircuitCode thread had added the client to the client manager but before the ScenePresence had registered to process the CompleteAgentMovement message.
This is most probably why the message appeared to get lost on a proportion of entity transfers.
A better long term solution may be to set the IClientAPI.SceneAgent property before the client is added to the manager.
2013-09-25 18:45:56 +01:00
Justin Clark-Casey (justincc) 732554be04 Reinsert 200ms sleep accidentally removed in commit 7dbc93c (Wed Sep 18 21:41:51 2013 +0100) 2013-09-25 18:29:14 +01:00
Justin Clark-Casey (justincc) f4d82a56f4 Double the time spent waiting for a UseCircuitCode packet in LLUDPServer.HandleCompleteMovementIntoRegion()
This is to deal with one aspect of http://opensimulator.org/mantis/view.php?id=6755
With the V2 teleport arrangements, viewers appear to send the single UseCircuitCode and CompleteAgentMovement packets immediately after each other
Possibly, on occasion a poor network might drop the initial UseCircuitCode packet and by the time it retries, the CompleteAgementMovement has timed out and the teleport fails.
There's no apparant harm in doubling the wait time (most times only one wait will be performed) so trying this.
2013-09-18 22:09:46 +01:00
Justin Clark-Casey (justincc) 7dbc93c62a Change logging to provide more information on LLUDPServer.HandleCompleteMovementIntoRegion()
Add more information on which endpoint sent the packet when we have to wait and if we end up dropping the packet
Only check if the client is active - other checks are redundant since they can only failed if IsActve = false
2013-09-18 21:41:51 +01:00
Justin Clark-Casey (justincc) 93dffe1777 Add stat clientstack.<scene>.IncomingPacketsOrphanedCount to record well-formed packets that were not initial connection packets and could not be associated with a connected viewer. 2013-08-14 22:33:12 +01:00
Justin Clark-Casey (justincc) 0d5680e971 Count any incoming packet that could not be recognized as an LLUDP packet as a malformed packet. Record this as stat clientstack.<scene>.IncomingPacketsMalformedCount
Used to detect if a simulator is receiving significant junk UDP
Decimates the number of packets between which a warning is logged and prints the IP source of the last malformed packet when logging
2013-08-14 22:08:28 +01:00
Justin Clark-Casey (justincc) b1c26a56b3 Fix an issue where under teleport v2 protocol, teleporting from regions in an line from A->B->C would not close region A when reaching C
The root cause was that v2 was only closing neighbour agents if the root connection also needed a close.
However, fixing this requires the neighbour regions also detect when they should not close due to re-teleports re-establishing the child connection.
This involves restructuring the code to introduce a scene presence state machine that can serialize the different add and remove client calls that are now possible with the late close of the
This commit appears to fix these issues and improve teleport, but still has holes on at least quick reteleporting (and possibly occasionally on ordinary teleports).
Also, has not been completely tested yet in scenarios where regions are running on different simulators
2013-08-08 23:29:30 +01:00
Justin Clark-Casey (justincc) 68b98a8003 minor: Add name to debug lludp packet level feedback on console 2013-08-01 23:16:41 +01:00
Justin Clark-Casey (justincc) 0c4c084bed Try a different approach to slow terrain update by always cycling the loop immediately if any data was sent, rather than waiting.
What I believe is happening is that on initial terrain send, this is done one packet at a time.
With WaitOne, the outbound loop has enough time to loop and wait again after the first packet before the second, leading to a slower send.
This approach instead does not wait if a packet was just sent but instead loops again, which appears to lead to a quicker send without losing the cpu benefit of not continually looping when there is no outbound data.
2013-08-01 18:12:28 +01:00
Justin Clark-Casey (justincc) 932c382737 Revert "Issue: painfully slow terrain loading. The cause is commit d9d995914c (r/23185) -- the WaitOne on the UDPServer. Putting it back to how it was done solves the issue. But this may impact CPU usage, so I'm pushing it to test if it does."
This reverts commit 59b461ac0e.
2013-08-01 18:11:50 +01:00
Diva Canto 59b461ac0e Issue: painfully slow terrain loading. The cause is commit d9d995914c (r/23185) -- the WaitOne on the UDPServer. Putting it back to how it was done solves the issue. But this may impact CPU usage, so I'm pushing it to test if it does. 2013-08-01 09:27:44 -07:00
Justin Clark-Casey (justincc) 1416c90932 minor: Add timeout secs to connection timeout message. Change message to reflect it is a timeout due to no data received rather than an ack issue. 2013-07-29 23:53:59 +01:00
Justin Clark-Casey (justincc) 8004e6f31c Fix issue just introduced in 8efe4bfc2e where I accidentally left in a test line to force very quick client unack 2013-07-29 23:38:54 +01:00
Justin Clark-Casey (justincc) 8efe4bfc2e Make "abnormal thread terminations" into "ClientLogoutsDueToNoReceives" and add this to the StatsManager
This reflects the actual use of this stat - it hasn't recorded general exceptions for some time.
Make the sim extra stats collector draw the data from the stats manager rather than maintaing this data itself.
2013-07-29 23:18:29 +01:00
Diva Canto cac37e298c Deleted all [ZZZ] debug messages. 2013-07-24 14:31:30 -07:00
Diva Canto e6a0f6e428 One more thing to test in order to let CompleteMovement go up the stack. 2013-07-24 14:29:51 -07:00
Diva Canto 14530b2607 Minor adjustment on timings of waits. 2013-07-24 14:29:37 -07:00
Diva Canto 3891a8946b New Teleport protocol (V2), still compatible with V1 and older. (version of the destination is being checked)
In this new protocol, and as committed before, the viewer is not sent EnableSimulator/EstablishChildCommunication for the destination. Instead, it is sent TeleportFinish directly. TeleportFinish, in turn, makes the viewer send a UserCircuitCode packet followed by CompleteMovementIntoRegion packet. These 2 packets tend to occur one after the other almost immediately to the point that when CMIR arrives the client is not even connected yet and that packet is ignored (there might have been some race conditions here before); then the viewer sends CMIR again within 5-8 secs. But the delay between them may be higher in busier regions, which may lead to race conditions.
This commit improves the process so there are are no race conditions at the destination. CompleteMovement (triggered by the viewer) waits until Update has been sent from the origin. Update, in turn, waits until there is a *root* scene presence -- so making sure CompleteMovement has run MakeRoot. In other words, there are two threadlets at the destination, one from the viewer and one from the origin region, waiting for each other to do the right thing. That makes it safe to close the agent at the origin upon return of the Update call without having to wait for callback, because we are absolutely sure that the viewer knows it is in th new region.
Note also that in the V1 protocol, the destination was getting UseCircuitCode from the viewer twice -- once on EstablishAgentCommunication and then again on TeleportFinish. The second UCC was being ignored, but it shows how we were not following the expected steps...
2013-07-24 14:27:58 -07:00
Justin Clark-Casey (justincc) a57a472ab8 Add proper method doc and comments to m_dataPresentEvent (from d9d9959) 2013-07-23 00:51:59 +01:00
Justin Clark-Casey (justincc) 9fb9da1b6c Add clientstack.InboxPacketsCount stat. This records the number of packets waiting to be processed at the second stage (after initial UDP processing)
If this consistently increases then this is a problem since it means the simulator is receiving more requests than it can distribute to other parts of the code.
2013-07-23 00:35:41 +01:00
Justin Clark-Casey (justincc) 60732c96ef Add clientstack.OutgoingUDPSendsCount stat to show number of outbound UDP packets sent by a region per second 2013-07-23 00:35:34 +01:00
Justin Clark-Casey (justincc) 8396f1bd42 Record raw number of UDP receives as clientstack.IncomingUDPReceivesCount 2013-07-23 00:35:23 +01:00
Justin Clark-Casey (justincc) bf517899a7 Add AverageUDPProcessTime stat to try and get a handle on how long we're taking on the initial processing of a UDP packet.
If we're not receiving packets with multiple threads (m_asyncPacketHandling) then this is critical since it will limit the number of incoming UDP requests that the region can handle and affects packet loss.
If m_asyncPacketHandling then this is less critical though a long process will increase the scope for threads to race.
This is an experimental stat which may be changed.
2013-07-23 00:35:09 +01:00
Robert Adams e6b6af62dd Added check for user movement specification before discarding an incoming
AgentUpdate packet. This fixes the problem with vehicles not moving forward
after the first up-arrow.
Code to fix a potential exception when using different IClientAPIs.
2013-07-22 15:41:14 -07:00
Diva Canto 174105ad02 Fixed the stats in show client stats. Also left some comments with observations about AgentUpdates. 2013-07-21 09:00:27 -07:00
Justin Clark-Casey (justincc) 61eda1f441 Make the check as to whether any particular inbound AgentUpdate packet is significant much earlier in UDP processing (i.e. before we pointlessly place such packets on internal queues, etc.)
Appears to have some impact on cpu but needs testing.
2013-07-21 08:58:55 -07:00
Justin Clark-Casey (justincc) 5a2d4d888c Hack in console command "debug lludp toggle agentupdate" to allow AgentUpdate in packets to be discarded at a very early stage.
Enabling this will stop anybody from moving on a sim, though all other updates should be unaffected.
Appears to make some cpu difference on very basic testing with a static standing avatar (though not all that much).
Need to see the results with much higher av numbers.
2013-07-21 08:58:21 -07:00
Justin Clark-Casey (justincc) 3a476bf60c Fix up a temporary debugging change from last commit which stopped "lludp stop out" from actually doing anything 2013-07-21 08:57:36 -07:00
Justin Clark-Casey (justincc) 63c42d6602 Do some simple queue empty checks in the main outgoing udp loop instead of always performing these on a separate fired thread.
This appears to improve cpu usage since launching a new thread is more expensive than performing a small amount of inline logic.
However, needs testing at scale.
2013-07-21 08:56:48 -07:00
Justin Clark-Casey (justincc) d9d995914c try Hacking in an AutoResetEvent to control the outgoing UDP loop instead of a continuous loop with sleeps.
Does appear to have a cpu impact but may need further tweaking
2013-07-18 12:28:02 -07:00
Diva Canto 864f15ce4d Revert the revert
Revert "Trying to hunt the CPU spikes recently experienced."

This reverts commit ac73e70293.
2013-07-15 11:52:26 -07:00
Diva Canto ac73e70293 Trying to hunt the CPU spikes recently experienced.
Revert "Comment out old inbound UDP throttling hack. This would cause the UDP"

This reverts commit 38e6da5522.
2013-07-15 11:27:49 -07:00
Diva Canto a412b1d682 Moved SendInitialDataToMe to earlier in CompleteMovement. Moved TriggerOnMakeRootAgent to the end of CompleteMovement.
Justin, if you read this, there's a long story here. Some time ago you placed SendInitialDataToMe at the very beginning of client creation (in LLUDPServer). That is problematic, as we discovered relatively recently: on TPs, as soon as the client starts getting data from child agents, it starts requesting resources back *from the simulator where its root agent is*. We found this to be the problem behind meshes missing on HG TPs (because the viewer was requesting the meshes of the receiving sim from the departing grid). But this affects much more than meshes and HG TPs. It may also explain cloud avatars after a local TP: baked textures are only stored in the simulator, so if a child agent receives a UUID of a baked texture in the destination sim and requests that texture from the departing sim where the root agent is, it will fail to get that texture.
Bottom line: we need to delay sending the new simulator data to the viewer until we are absolutely sure that the viewer knows that its main agent is in a new sim. Hence, moving it to CompleteMovement.
Now I am trying to tune the initial rez delay that we all experience in the CC. I think that when I fixed the issue described above, I may have moved SendInitialDataToMe to much later than it should be, so now I'm moving to earlier in CompleteMovement.
2013-07-13 09:46:58 -07:00
Robert Adams 38e6da5522 Comment out old inbound UDP throttling hack. This would cause the UDP
reception thread to sleep for 30ms if the number of available user worker
threads got low. It doesn't look like any of the UDP packet types are
marked async so this check is 1) unnecessary and 2) really crazy since
it stops up the reception thread under heavy load without any indication.
2013-07-09 18:34:24 -07:00