To pull in https://github.com/element-hq/dendrite/pull/3630
Also pulls in a bunch of bug fixes on v12 rooms, which testing did not
catch.
`FAILURE: #655: Server rejects invalid JSON in a version 6 room` is an
expected fail now.
Due to a bad merge, timeouts were not early-returning and so could cause
a stall in the roomserver. Fix the bad merge and early-out.
Minor fix to a bad merge of my change for #3484
This also incorporates a suggested tweak to the prior PR #3588 to avoid
holding the lock unnecessarily.
### Pull Request Checklist
<!-- Please read
https://matrix-org.github.io/dendrite/development/contributing before
submitting your pull request -->
* [x] I have added Go unit tests or [Complement integration
tests](https://github.com/matrix-org/complement) for this PR _or_ I have
justified why this PR doesn't need tests
* [x] Pull request includes a [sign off
below](https://element-hq.github.io/dendrite/development/contributing#sign-off)
_or_ I have already signed off privately
Signed-off-by: `Vivianne Langdon <puttabutta@gmail.com>`
---------
Signed-off-by: Vivianne Langdon <puttabutta@gmail.com>
This should fix#3484 -- at the very least, it has resolved the issues
we've had on our instance. I've extended the lock so it surrounds the
unsubscribe as well as a new check if the latest sequential ID seen by
the ephemeral thread is newer than the sequential ID seen by the durable
thread. This solves a race where the unsubscribe happened while a new
message was inflight, and so the message was not handled.
This is a fix for a race condition that has been pretty unreliable to
reproduce manually, so I don't know if there's a good way to add a
reliable automated test for it. If you have any ideas I'm open to it.
### Pull Request Checklist
<!-- Please read
https://matrix-org.github.io/dendrite/development/contributing before
submitting your pull request -->
* [x] I have added Go unit tests or [Complement integration
tests](https://github.com/matrix-org/complement) for this PR _or_ I have
justified why this PR doesn't need tests
* [x] Pull request includes a [sign off
below](https://element-hq.github.io/dendrite/development/contributing#sign-off)
_or_ I have already signed off privately
Signed-off-by: `Vivianne Langdon <puttabutta@gmail.com>`
---------
Signed-off-by: Vivianne Langdon <puttabutta@gmail.com>
Co-authored-by: Kegan Dougal <7190048+kegsay@users.noreply.github.com>
Using a map means the events are processed in an indeterminate order,
which causes a lot of `/state_ids` requests as the `prev_events` are not
known when the order is lost.
Signed-off-by: Neil Alexander <git@neilalexander.dev>
---------
Signed-off-by: Neil Alexander <git@neilalexander.dev>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Fixes#2504
A few issues with the previous iteration:
- We never returned `inaccessible_children`, which (if I read the code
correctly), made Synapse raise an error and thus not returning the
requested rooms
- For restricted rooms, we didn't return the list of allowed rooms
Based on #3340
This adds a `/_synapse/admin/v1/event_reports` endpoint, the same
Synapse has. This way existing tools also work with Dendrite.
Given this is already getting huge (even though many test lines),
splitting this into two PRs. (The next adds "getting one report" and
"deleting reports")
[skip ci]
Part of #3216 and #3226
There will be a follow up PR which is going to add the same admin
endpoints Synapse has, so existing tools also work for Dendrite.
This now should actually speed up startup times.
This is because _many_ rooms (like DMs) don't have room ACLs, this means
that we had around 95% pointless DB queries. (as queried on d.m.org)
This hopefully reduces the garbage we currently produce.
(Using [GlitchTip](https://glitchtip.com/) on my personal instance, this
seems to look better)
Fixes https://github.com/matrix-org/dendrite/issues/3240 and potentially
a root cause for state resets.
While testing, I've had added some more debug logging:
```
time="2023-12-16T18:13:11.319458084Z" level=warning msg="already processed event" event_id="$qFYMl_F2vb1N0yxmvlFAMhqhGhLKq4kA-o_YCQKH7tQ" kind=KindNew times=2
time="2023-12-16T18:13:14.537389126Z" level=warning msg="already processed event" event_id="$EU-LTsKErT6Mt1k12-p_3xOHfiLaK6gtwVDlZ35lSuo" kind=KindNew times=5
time="2023-12-16T18:13:16.789551206Z" level=warning msg="already processed event" event_id="$dIPuAfTL5x0VyG873LKPslQeljCSxFT1WKxUtjIMUGE" kind=KindNew times=5
time="2023-12-16T18:13:17.383838767Z" level=warning msg="already processed event" event_id="$7noSZiCkzerpkz_UBO3iatpRnaOiPx-3IXc0GPDQVGE" kind=KindNew times=2
time="2023-12-16T18:13:22.091946597Z" level=warning msg="already processed event" event_id="$3Lvo3Wbi2ol9-nNbQ93N-E2MuGQCJZo5397KkFH-W6E" kind=KindNew times=1
time="2023-12-16T18:13:23.026417446Z" level=warning msg="already processed event" event_id="$lj1xS46zsLBCChhKOLJEG-bu7z-_pq9i_Y2DUIjzGy4" kind=KindNew times=4
```
So we did receive the same event over and over again. Given they are
`KindNew`, we don't short circuit if we already processed them, which
potentially caused the state to be calculated with a now wrong state
snapshot.
Also fixes the back pressure metric. We now correctly increment the
counter once we sent the message to NATS and decrement it once we
actually processed an event.
This should fix#3004 by making sure we also update our in-memory ACLs
after joining a new room.
Also makes use of more caching in `GetStateEvent`
Bonus: Adds some tests, as I was about to use `GetBulkStateContent`, but
turns out that `GetStateEvent` is basically doing the same, just that it
only gets the `eventTypeNID`/`eventStateKeyNID` once and not for every
call.
Fixes include:
- Translating state keys that contain user IDs to their respective room
keys for both querying and sending state events
- **NOTE**: there may be design discussion needed on what should happen
when sender keys cannot be found for users
- A simple fix for kicking guests from rooms properly
- Logic for boundary history visibilities was slightly off (I'm
surprised this only manifested in pseudo ID room versions)
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
This PR adds a config key `room_server.default_config_key` to set the
default room version for the room server.
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
There are cases where a dendrite instance is unaware of a pseudo ID for
a user, the user is not a member of that room. To represent this case,
we currently use the 'zero' value, which is often not checked and so
causes errors later down the line. To make this case more explict, and
to be consistent with `QueryUserIDForSender`, this PR changes this to
use a pointer (and `nil` to mean no sender ID).
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
The previous version was getting **ALL** membership events (as
`ClientEvents`, so going through `NewEventFromTrustedJSONWithID`) for a
given room.
Now we are querying only locally joined users as `ClientEvents`, which
should **significantly** reduce allocations.
Take for example a large room with 2k membership events, but only 1
local user - avoiding 1999 `NewEventFromTrustedJSONWithID` calls just to
calculate the `roomSize` which we can also query by other means.
This is also getting called for every `OutputRoomEvent` in the userAPI.
Benchmark with 1 local user and 100 remote users.
```
pkg: github.com/matrix-org/dendrite/userapi/consumers
cpu: 12th Gen Intel(R) Core(TM) i5-12500H
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
LocalRoomMembers-16 375.9µ ± 7% 327.6µ ± 6% -12.85% (p=0.000 n=10)
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
LocalRoomMembers-16 79.426Ki ± 0% 8.507Ki ± 0% -89.29% (p=0.000 n=10)
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
LocalRoomMembers-16 1015.0 ± 0% 277.0 ± 0% -72.71% (p=0.000 n=10)
```
This should fix two issues with backfilling:
1. right after creating and joining a room over federation, we are doing
a `/backfill` request, which would return redacted events, because the
`authEvents` are empty. Even though the spec states that, in the absence
of a history visibility event, it should be handled as `shared`.
2. `gomatrixserverlib: unsupported room version ''` - because, well, we
were never setting the `roomInfo` field..