One of the message servers will look for an appropriate connection for our devices, the messages are sent to the respective devices, which is indicated with one grey tick.
Once we are online and a process is created for our individual devices, with the message being delivered by the process, then this is indicated by two grey ticks.
Once any of us reads our message(s) then a response is sent to the messaging server to signify acknowledgment from the device.
Media
Mampshika reads the message and is lazy to type back to Fani and decides to send a voice note.
Voice notes are in the form of mp3 files which are quite heavy files these will strain our messaging servers. HTTP server(s) are used to handle media file requests be it mp3,mp4,jpg or any other media file.
The HTTP server has a CDN or database as its coworker to handle this strenuous job. Mampshika’s device will ignore the existing connection already established for the lightweight messaging requests and upload the media File to the HTTP server.
Once the voice note is uploaded to the HTTP server a unique hash is returned by the server as a message with the media type to Mampshika’s device. The message is then sent through the messaging server to Fani’s device and when Fani’s device receives this message, it uses the hash to download the media file from the HTTP server.
Also, Check Out:
Whatsapp System Design and Chat Messaging Architecture (Part 1)
Scaling to Millions of Simultaneous Connections