tik tok is a read heavy system, the amount of viewers is more than that of the content creators. This could equate to a ratio of 1:100.
The video initially goes through the upload API and then moves to the feature engineering part. This feature engineering part takes care of the categorization of the video, verification, analyses of the metadata, images, sound basically, this is the machine learning algorithms, assessing the video as best as they can.
The entire video processing is done at this stage. Depending on the video, two things will happen. A 30 second video is broken down into 4 parts, a 10 second video just goes through all stages of the video. For the 30 seconds video the video file is then broken down into various chunks, these need to be converted into 4 different processes.
- The first process is converting the file into different formats.
- Second is to convert the chunks into different resolutions
- What is stored as one file will be 8 to 16 copies of that file.
- Now these are uploaded into Amazon S3 buckets
The Amazon S3 buckets form the first part of the storage process; One video will have different files according to the supported resolutions and formats. The files are uploaded to the S3 buckets according to region and demographics of users, if the file is not accessible in one region then it can be accessed in the other region.
To achieve this we have the Amazon S3 and the tic toks CDN (Akamai) is used, A CDN ensures the videos are always available, even if a a regional node may be down.
So when a user wants to view a specific video, the feature engineering service checks the user database for the video as well as the video meta data database. The algorithm then combines this with user details, to deliver the video through their nearest region and if their region is down, the other region next to them.
Additional Reading:
https://later.com/blog/tiktok-algorithm