The 4 key business factors that a SaaS AWS architect must take into account

As the Solution Architect/CTO for a serverless SaaS startup, I analyze, test, and perform thorough research on the best practices for building a reliable, scalable, and cost-effective architecture.

During my research, I realized that the business problems must be properly phrased before I can begin solving them as an engineering challenge. So, I brainstormed with the business team and came up with the following 4 critical factor that defines success from a business & Tech prespective of the problem (Business → Tech) :

User Experience → Latency
Total Cost of Ownership → Cost Optimization
Security → Security
Productivity → Automation

Assumptions:

I may confidently presume that each workload satisfies a distinct business need referring to as a product feature.
Serverless Design patterns are called workloads
This business and tech negotiations do NOT involve data models and algorithms. Therefore, the dependencies on these components are beyond the realm of business, and engineers must be consulted prior to committing to timelines and any of the 4 factors mentioned above.

We analyze each area using straightforward and simple questions that require consideration of the most important technical factors.

Latency

Business Questions:

How long will it take for the user to obtain the intended end result?
How much modification is permitted on top of it?
How long will each request take?
Can we filter the results based on the type of business?
How does this feature affect the user experience?

Regardless of the sophistication of your architecture, the main concern is the user experience. In this context, user experience refers to latency and processing time.

Ensure you consider the following factors before responding to this question (NOTE: we are an analytics SaaS company):

Average Lambda processing times
Throughput limitations
ETL pipelines (Glue)
How long the SQL queries will run,
Internal data preparation operations
Workflow orchestrations and queues, etc

We leverage over 42 AWS services to ensure the seamless operation of our complex application. Therefore, it is of the utmost importance that these services stay connected without issue. Any bottleneck in the pipelines or workflow orchestration code might lead to a considerable loss of computing time.

In a serverless world, time literally means money.

Therefore, we pay a high price for both in the form of a sluggish user experience caused by under-utilized computing services.

In many instances, the pace at which a business gets answers is dependent on a number of external factors such as data source API constraints, throttling difficulties, data processing logic, and computing-intensive algorithms.

Keep your customer informed, send notifications and emails, and try to optimize the process where you can cut the load time. If inevitable, ask UX designers to do the magic.

Cost

Business Questions: How much this feature is going to cost us?

Cost = FTE Capital cost + Recurring Infrastructure cost

FTE Capital cost: How long has your team spent building this feature?

You will not reinvent the wheel for every feature as a SaaS company. Before you begin coding, you and your development team spend a considerable amount of time on design and implementation. (If you’re a serverless startup like us, I recommend the serverless patterns at https://serverlessland.com/ for AWS’s best practices.)

I refer to the time spent defining a pattern, building it, and testing it ONCE before releasing a product as FTE capital cost. (please comment if the appropriate phrase exists)

Recurring Infrastructure cost: How much does the operation of this architectural component cost, each time it runs?

Each architectural workload MUST be developed keeping costs in place. Despite the popular belief of compute ∝ cost, I still think that a well-defined design pattern AND data models can run your workloads efficiently without burning a hole in your pocket.

Disclaimer: Multi-tenant SaaS model billing is quite challenging to monitor. For the same reason, whenever my CEO inquired about the actual cost per tenant, I always provided him with the average cost per tenant. I have not established extensive billing, tagging, and logging in order to exactly compute tenant-level costs till 1 and half years into development (I strongly recommend setting it up in the very beginning to accurately track billing. I am writing a separate article soon on that)

Average Recurring Infrastructure cost= a1x1 + a2x2 + a3x3 + …

where a1 is the billing unit price of the AWS service (sub-component) and x1 is the compute/storage usage. For example,

a1 → 1 DPU of Glue costs about $0.44/hour and

x1 → average usage/tenant = 25 min/day

so total glue billing is = 0.44 * 25/60

Disclaimer: This equation is far more complex to solve than this. This method of computation omits elements such as the number of DPUs utilized per job run, the cost of S3 requests, the cost of queues, and retry attempts. However, the exact computation is outside the scope of this article. I intend to write a separate essay about serverless SaaS billing soon

Security

Business Question: How safe and secure is this implementation?

The business team never inquire about firewalls, open ports, SSL, network access control lists, etc. They pose identical questions in their language, such as Firewall (everyone uses that these days), Control access to data, identity, configuration modifications, etc. As architects, it is our obligation to ensure that all queries are addressed in simple English.

For Serverless SaaS, Here are my responses:

AWS WAF for our web applications → firewall
Cloudfront WAF rules+ Geo-restrictions →Geographical Restrictions
AWS config → configuration change management
Cloudtrail logs for all internal AWS logs → Continuous monitoring
SecurityHub → centralized monitoring
Frequently rotate AWS keys and code commit credentials → Best-practice
Use MFA for logging into AWS→ Best-practice
S3-SSE protects data in transit and at rest → Best-practice
Utilize AWS workspaces with Network and continuous code monitoring → Best-practice Advanced
IP restrictions on workload and code push, etc → Best-practice Advanced

Automation

Business Question: How can you improve productivity in terms of releasing a new feature?

The business team requires completion of all requirements yesterday, whereas we demand days or even months. Therefore, we both agree that this can only be accomplished by automating as many processes as feasible. Here is how we did it in a nutshell:

We develop a boilerplate code lambda function that utilizes boto3 to fetch code and generate a suitable AWS resource. For instance, a GLUE job will be generated using a lambda function in accordance with a predefined script. This allows us to configure a full function using an editable script, hence accelerating the development process. (Cloudformation is good as an IaC but not for this use case, as the location of glue scripts is distinct from IaC settings)
S3 buckets can only be generated with a preset IaC code.
CI/CD pipelines for code releases using AWS Codestar
Very important: Following specific naming and tagging conventions. Please DO NOT ignore this practice
Sadly, there is no world-class automation tool available in the AWS quiver. (comment if you know one). So, we use Endtest

The best principles to implement are listed in the 6 pillars document below: https://aws.amazon.com/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/

I am looking forward to using the input to improve this article. Please do not forget to contribute your views and best practices that can assist us in building exciting Serverless Architectures.