tldr; how to figure out what to delete from locals...
# android-architecture
u
tldr; how to figure out what to delete from locals, if remotes are paginated
g
can't you fetch like a range of messages or a specific one? You've got a push that a specific range of messages has changed, you check if it's within your local messages and if true, than fetch this chunk.
u
if im foreground = websocket is active, then its all good,
but after I start the app / come back from foreground , I wont receive pushes "of what I have missed"
or should I ?
g
aren't things like firebase are designed for this kind of case?
do you have a particular issue with reliability of your push service?
it's a bit superficial, but "Grokking the System Design Interview" course contains some examples of Messenger services and explains how WhatsApp or Twitter is made. Proper synchronization is a quite challenging task 🙂.
u
I dont think slack has that automatic push all missed events
I think its regular https call from client
i.e. pull
yea I know its challenging, just wondering what would be the mechanism to update the off-pages
unless only keeping 1 page of data locally, but that is lame
honestly this is a client thing I think, I have push of changes, and pull to get all history, but its paginated, so im wondering how or if I should update the off pages, hopefully you know what i mean
I mean If I were to sync all local data, then its no problem, but this wont scale..
g
Not sure about slack, but Facebook Messanger and WhatsApp are using push model (long polling/websockets). It makes even more sense once you want to display the delivery status of the message. Anyhow, with pull model I guess you'd probably need to have two sections in your server response:
edited
and
latest
. So you can patch your history.
u
but neither of then for sure has websocket on background -- so you need to "pull" to synchronize, when coming back from background
g
Yes, you need to pull and to do that properly you need some API that returns diff of data from your last connection
u
yes, but, how to do that? -- meaning, sending every message id I have to server.. that cant scale
g
No, you can have special token/cursor (for simplest, but not very robust solution, it may be just timestamp of last message) on each diff request, so server know snapshot version of messages on your device
u
hm not sure if I udnerstand, if I tell server what my last message is, how can it tell that I for example need to update some n-th message I have locally because it was edited? (meaning that message is from before my last message)
g
I used this example just to illustrate how this cursor for diff request may work
of course real syncronization implementation may require much more sophisticated implementation, depending on your case and required features
This is why I mentioned token/cursor, something like you do initial syncronization by requesting all the data (which is additional challenge) together with last chunk of data you getting token/cursor that may be used to request changes happened after receiving this data
u
okay so the calculation is a backend responsibility, based on my token / cursor?
g
there is no easy solution and trade-offs between simple/featureful implementation are everywhere, for example you may concider sync each conversation separately
u
yea Im doing that, but still that can not scale, if I query all of history in that conversation ever
g
what do you mean?
u
Im thinking only about this one use: I come back from offline, and I need to know what changed -- somehow -- other than fetching all of convo history to figure it out
basically the trouble is that message can be edited and deleted, so I need to look backwards also on all messages
g
I come back from offline, and I need to know what changed -- somehow -- other than fetching all of convo history to figure it out
Globally you have only 2 choices, download everything (or just first page) or use some diff/sync API, but how those diffs will be implemented highly depends on your use case, complexity of your server etc
basically the trouble is that message can be edited and deleted, so I need to look backwards also on all messages
No, if you have diffs
u
okay so, lets say I want the diffs, since I dont want to download everything
g
okay
u
that means I need to tell the server about every message id I have locally for that conversation?
or is there some othe way
g
it one of possible options and your server will create diff info (added, deleted, updated messages) based on this, but it’s not very scalable, you right
u
yea, given huge conversation that will suck
g
there are other ways, when server remembers your initial sync session and knows which messages added/updated/deleted from your last sync
u
im thinking those etags you mentioned, cant that be used?
i.e. every conversation.history pull id get some fingerprint, .. but not sure how would that work
g
server just should keep snapshots for every client, and to identify this snapshot you need some sort id/tag/cursor/etc, it also just may be auth token
u
hm, what snapshot do you mean -- snapshot = what server thinks my local data is?
btw Im thiking im missign some kind of thery, do you have something to recommend for me to read maybe?
im feeling like im reinventing the wheel
g
snapshot = what server thinks my local data is
yes
u
okay -- maybe I'll ask different, If I can only calculate the diffs on client, my only options is to fetch all history, right?
g
there are probably some materials about it, but I don’t know any actual “theory” about this, just engineering solution, if you know date of modification of every message on client’s device you can just request all the messages changed after this date, actual implementation may be different of course
calculate the diffs on client
Yes, request the data from server about all changes 😄
to calculate diffs you need both versions of data, old and new, I don’t think that there is any solution that may solve it
u
yea, thats super stupid, and all data from first message I have locally,. thats only stupid
g
I fixed my message, it may be not “All” data, but only data that required to create diff, like list of all message ids with last update date (or something like version id)
u
yea ..would that me durable against breaking my database locally? meaning someone with root -- or more likely developer via bug, causing a message to be deleted ... or is that overkill?
g
but because by definition server has much more knowledge than client, because has much more resources, all the data is persistent, it much easier and more efficient (in terms of client) to do on server side
not sure what you mean
u
yes, im now looking for ideal solution, only after that ill try to bring it to reality and how flexible my api team is 😄
what I mean is that if I send the id - timestamp map to server, id only get .. okay nevermind, server would tell that it is missing the one message
im still wondering how would the server keep track of my data -- what if I delete something locally in a bug, and therefore server would still keep it as I have it ..idk, I feel like that will be very britle
g
I think it’s overkill, if you cannot trust your local data it’s really strange and it looks like overkill to “validate” your local DB on each sync SQL is very robust solution, supports transactions, so not sure what is “developer via bug, causing a message to be deleted”
u
ofc I trust sql, I dont trust developers to not cause a bug and deleting some unrelated id or something
- this would be handlable given a full sync, as youd see something in remote data not presend in local = insert it
g
im still wondering how would the server keep track of my data
depens on strategy, but for conversation it may have some database of operations on particular conversation and then when you request it again just return all new operations add/delete/update
If you know about some existing problem you may just trigger full sync
u
ok I can see that, but thats still not resilient agains the bug
hm, how would I tell there is a problem?
g
Your user complains, you QA found it, your developer discovered it in code
u
I mean this will manifest as runtime thing -- how will user tell he is missing a message, when he doesnt know it should be there
g
you may do periodical validateion with server using special api, like check amount of conersations and amount of messages
u
😄 what if he not deletes, but updates it with foobar data, and wont update the lastUpdated timestamp hh
g
when he doesnt know it should be there
i don’t think this is somehow related to you problem, how to write program without bugs, how to write system that detects bugs, I don’t know, it’s even more broad question than “how to write real time chat messenger”
u
hm not really because full diff would fix it, the problem is pagination
g
no, it’s not related to pagination too
there are bugs in program and any kind API cannot solve all of them
u
I think so, because if you diff against a page, you dont know about the rest of the data in next pages
g
You asking questions that don’t have any good answer, each solution is trade-off between of big amout of factors, choose which are more important for you, perfection solution doesn’t exist
u
basically yea, I was jsut curious about state of art
g
I think so, because if you diff against a page, you dont know about the rest of the data in next pages
I never suggest to use this approach, you right, it’s imo bad solution
u
whatsapp doesnt give a damn, they dont even have a remote database right? they only relay messages
g
Yes
each client has own set of messages
they have backups, but it’s just a file with history that can be used to restore them
u
hm but then you can delete messages, and you can do that while the other is offline so they still have this problem
g
also, as user I hate WhatsApp solutions, but if I would developer, maybe I would do the same, just depending on current situation, product requirements etc
u
okay guess the only 100% solution is to track the disconnects
g
not sure what you mean about track the disconnect and why it’s 100%
u
really? I feel whatsapp is the most solid of messengers, all other crap out on me from time to time
or were you thinking about them not keeping history if you reisntall the app?
g
hm but then you can delete messages, and you can do that while the other is offline so they still have this problem
there are different strategies about message deleting, sometimes you can delete only local message, sometimes you delete message also on devices of all your receipients, later require deliver delete intent same way as you deliver information about new message or message status
u
yes, its the same issue = meaning you miss the websocket push when offline / background
and need to pull upon online / foreground
and or "server tell me what I have missed since timestamp = x"
g
really? I feel whatsapp is the most solid of messengers, all other crap out on me from time to time
They support only 1 client at the same time, super bad solution, because of their limitation of this architecture even cannot have proper standalone web client You don’t have real history of messages, no proper syncronization There is no way to use it without phone
u
yea true
I meant in terms of reliability of push notifs
g
It’s not related to push notifications
It’s important for messanger, but it’s just one of tools
u
just saying, since we are giving our feelings on whatsapp 😄
g
You can check Telegram or Signal or some XMPP messanger implementation, they are pretty different, but open source
u
im wondering, is there a way to see websocket traffic via say Charles? id just copy messenger / slack
g
Yes, I believe it’s possible, but never tried
u
sounds the best, okay thank you!
g
there are open source solutions, not sure that attempt to do reverse engineering for such huge, complicated, proprietary solutions is good for the first step
also all of them are encrypted, so you can use Charles, but you should replace certificates on device to allow to read them
also as I know FB messenger uses MQTT which is binary so you probably anyway cannot read it easily, especially understand how it works
you can also use Chrome Dev tools to see request at least on web
But sync on web may be pretty different from mobile app
Just checked, slack uses “cursors” word and has properties like “has_more” on message update request
g
The server can probably keep a history of edit versions and once you send your
lastUpdate
timestamp, the
now - lastUpdate
diff is calculated for you. In a way it's similar to DB migrations or git.
b
This is exactly what Couchbase Sync gateway solves by having
revision
documents. For ex, say you have documents 1 to 10 in remote and in local device. And some
change
happened in document #1 while local device was offline. Now, remote
has to
create a new revision document #11. (Here it need not be a new document itself, it can just be a
revision
attached to it). When local comes online, it will ask changes since #10. Remote can send Revision #11 which is actually a modified document #1.
Without explicit
revision
, there is no way (other than full sync) for remote server to know that the local device documents are in sync with remote.
This can scale really well as server does not need to
track
anything explicitly per device. They are all document revisions and each one is created for any change (modification, deletion, creation)