For the first part of this series our blog needs
- to serve content faster around the world 🌍;
- to be free from infrastructure monthly costs, free at least until it hasn't a lot of traffic 💰;
- admins to create, update and delete blog posts;
- users to list and read blog posts.
Those are our main requirements, from there I bet we will come up with something new which will go straight to our backlog. We will build the infrastructure from the ground using AWS CDK.
Infrastructure analysis
Let's review the requirements one by one and analyze how we could satisfy them.
Problem: it needs to be free, at least until it reaches a consistent amount of traffic.
Solution: we should go with serverless services and make use of AWS Free Tier. Going serverless from the start let us scale up whenever we have a lot of traffic and scale down when we do not have any and, of course we won't pay a dime when our resources are not in use.
Problem: users need to read blog posts, admins need to create them.
Solution: we will build a microservices infrastructure, based on
- Amazon API Gateway;
- AWS Lambda as our serverless compute source;
- Amazon DynamoDB as NoSQL database;
- Amazon Cognito as identity provider to identity and authorize admins;
Problem: we need to minimize latency and serve content around the world 🌎.
Solution: we must be able to go global and as near as possible to our users. To do so we will use Amazon CloudFront as content delivery network (CDN) to make our blog blazingly fast all around the world 🔥.
Infrastructure overview
- Admins authenticate themselves with Cognito;
- Admins call CloudFront domain to create/update/delete blog posts;
- CloudFront forward the request to Api Gateway;
- Api Gateway turns to a Lambda function to process the request;
- The function access the DynamoDB table to save/update/delete blog posts;
- DynamoDB stream wakes up and triggers another Lambda function;
- Which invalidate the cache of CloudFront. This process is needed because we need to refresh our cache with the new posts;
- Users call CloudFront endpoint to list/get blog posts;
- CloudFront checks if it has the request cached, if positive then it return the cached response without asking the Api Gateway; if it doesn't have the request cached it forwards the request to Api Gateway;
- Which fires a Lambda function to process the request;
- The Lambda function queries the DynamoDB posts database and gets the response to Api gateway, CloudFront and the the user.
This is pretty much how our flow will go, before we start we need to address some major points ⚠️:
- Api Gateway must receive request only from CloudFront, to do that we will have our Api behind an Api Key which will be ingested from CloudFront itself, users won't know anything and if they try to access the Api Gateway it will returns unauthorized ⛔;
- Users cannot sign themselves up to Cognito, only admins can create more admins 🗝️.
Database table design
Let's see the requirements:
- all blog posts must be listed, we probably won't need to list 100 posts, just a few of the recent ones;
- we need to be able to get posts by slug (slug: when you click on a post you won't see an id in your research bar, you will se a text which is called slug);
In term of DynamoDB we need to have this structure:
- Partition key (PK): it's our primary key and as such we will go by id🤯. We cannot go with the slug because it could change when admins change the title;
- Global secondary index (GSI): as we need to list every post I went with this configuration: partition key = 1 and sort key = creation date. This means we won't need to scan any items, we will be able to query them using our index. With this index we are able to list posts by state (draft or published);
- Global secondary index (GSI): this simply has our slug, which will be unique.
const table = new Table(this, name, { tableName: name, billingMode: BillingMode.PAY_PER_REQUEST, removalPolicy: buildConfig.environment != 'prod' ? RemovalPolicy.DESTROY : RemovalPolicy.RETAIN, partitionKey: { name: 'id', type: AttributeType.STRING, }, stream: StreamViewType.NEW_AND_OLD_IMAGES, }); table.addGlobalSecondaryIndex({ indexName: `${name}-list-index`, partitionKey: { name: 'pk', type: AttributeType.STRING, }, sortKey: { name: 'createdAt', type: AttributeType.NUMBER, } }); table.addGlobalSecondaryIndex({ indexName: `${name}-slug-index`, partitionKey: { name: 'slug', type: AttributeType.STRING, }, });
AWS Lambda functions
We need the most optimized functions, to do so we follow a few tricks and best practices written in one of my posts, you can find the link right here👇 Optimize Your AWS Lambda: Faster Means Cheaper
I've created a simple function to optimize all our lambdas
export const lambdaFactory = (scope: Construct, lambdaConfig: ILambdaFactory, environment: string): NodejsFunction => { lambdaConfig.role.addManagedPolicy(ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSLambdaBasicExecutionRole')); return new NodejsFunction( scope, lambdaConfig.name, { memorySize: lambdaConfig.memorySize, architecture: Architecture.ARM_64, timeout: Duration.seconds(10), runtime: Runtime.NODEJS_18_X, bundling: { minify: environment === 'prod' ? true : false, target: 'node18', keepNames: true, externalModules: [ '@aws-sdk/client-dynamodb', '@aws-sdk/lib-dynamodb', '@aws-sdk/util-dynamodb', '@aws-sdk/client-cloudfront', ], }, functionName: lambdaConfig.name, entry: `src/functions/${lambdaConfig.filenamePath}/index.ts`, logRetention: RetentionDays.TWO_WEEKS, environment: lambdaConfig.environment, role: lambdaConfig.role, }, ); };
Cognito user pool stack
Pretty standard stack, it contains:
- user pool which stores admins information to let them log in;
- user pool client which we will use to authenticate admins;
- user pool domain associated.
In the future we will use Lambda triggers, but in the meantime you can learn everything you need to know about them with here 👇 Amazon Cognito Triggers: All You Need To Know About Them | Part 1
Api Gateway stack
This one is really interesting, we need the following components:
- rest api (obviously 😅);
- api key: to allow the access only from CloudFront;
- cognito authorizer: to authenticate admins, it gets the referenced user pool from Cognito's stack;
- request validator: to validate our Apis body and querystrings, it's just an object we will import in our Apis;
- request model: linked to our validator it creates all our json schema models which will be used to validate the body of the post, put and patch requests.
Only interesting code here is the request model one, let's see what strange thing I've done:
const filesname = fs.readdirSync(`${__dirname}/models`); for (const file of filesname) { const fileName = file.replace('.json', ''); this.blogApiModels[`${fileName}`] = new Model(this, `${name}-${fileName}`, { restApi: restApi, modelName: `${name}-${fileName}`.replace(/-/g, ''), contentType: 'application/json', schema: JSON.parse(fs.readFileSync(`${__dirname}/models/${file}`).toString('utf-8')), }); };
What did drive my mind to create api models like this? Because we don't want to touch this file, it's our building block and as such it should receive as little edits as possible. Doing so we are able to cycle the /models folder and create every model our developers (which is no one 🤡) put in that folder.
Content delivery network to make our api blazingly fast
Well buckle up, this one was really painful but hopefully for us the dirty work has already been done 🧯.
⚠️Note: after many tries, the only constructor I was able to work with was L1 which means it's almost bare CloudFormation. I was not able to work with L2 constructors like Distribution because it doesn't let use put a custom header to our origins.
const cachePolicy = new CachePolicy(this, `${name}-policy`, { enableAcceptEncodingGzip: true, defaultTtl: Duration.days(14), minTtl: Duration.days(14), queryStringBehavior: CacheQueryStringBehavior.all(), }); new CfnDistribution(this, name, { distributionConfig: { enabled: true, httpVersion: HttpVersion.HTTP3, comment: 'CDN for Api layer', priceClass: PriceClass.PRICE_CLASS_ALL, defaultCacheBehavior: { allowedMethods: ['HEAD', 'DELETE', 'POST', 'GET', 'OPTIONS', 'PUT', 'PATCH'], targetOriginId: props.blogApiId, viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS, compress: true, cachePolicyId: NO_CACHE_POLICY_ID, }, cacheBehaviors: [ { pathPattern: `posts`, allowedMethods: ['HEAD', 'DELETE', 'POST', 'GET', 'OPTIONS', 'PUT', 'PATCH'], cachedMethods: ['GET', 'OPTIONS', 'HEAD'], targetOriginId: props.blogApiId, viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS, compress: true, cachePolicyId: cachePolicy.cachePolicyId, }, { pathPattern: `posts/*`, allowedMethods: ['HEAD', 'DELETE', 'POST', 'GET', 'OPTIONS', 'PUT', 'PATCH'], cachedMethods: ['GET', 'OPTIONS', 'HEAD'], targetOriginId: props.blogApiId, viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS, compress: true, cachePolicyId: cachePolicy.cachePolicyId, }, ], origins: [ { domainName: `${Fn.select(2, Fn.split('/', props.blogApiUrl))}`, id: props.blogApiId, originPath: `/${buildConfig.environment}`, originCustomHeaders: [ { headerName: 'X-Api-Key', headerValue: buildConfig.stacks.api.key, } ], customOriginConfig: { originProtocolPolicy: OriginProtocolPolicy.HTTPS_ONLY, originSslProtocols: [OriginSslPolicy.TLS_V1_2], }, } ], }, });
What is worth mentioning:
- the path we are going to cache are posts for listing our blog posts and posts/* for getting a single post. And you can see the cached methods are just the getters, all of the others end up within the no cache policy.
- domain name within the origin has a few CloudFormation intrinsic functions which really helped to parse the url to just the domain name one;
- with custom header we are able to ingest our api key header from CloudFront to Api Gateway
⚠️Note: usually you want to put the api key in a safe storage like AWS Secret Manager but it costs a few cents per secret per month. So just keep that in mind, I will try not to expose the api key 😉.
Cache invalidation design
Starting with a famous quote
What we need in our infrastructure is:
- DynamoDB table with stream enabled, which we already have ✅.
- Lambda function with permissions to create invalidation on CloudFront api distribution;
- Linking those two with an event source thus creating a trigger.
invalidatePostCacheFunction.addEventSource(new DynamoEventSource(table, { startingPosition: StartingPosition.LATEST, maxBatchingWindow: Duration.minutes(1), batchSize: 10, maxRecordAge: Duration.minutes(5), }));
and the code is pretty simple because it just invalidate the cache for posts. We will start by invalidating every time we create, update and delete an item and when our blog is live we can come back and change the logic of this function so it invalidates the specific item path when possible (like if we change only the content we do not need to invalidate the posts* cache but only the posts/{id} cache ✨.
⚠️Note: when importing the table you need to import it with fromTableAttributes method and specify the tableStreamArn param. As per this issue.
Conclusion
In this posts we learnt how to create a serverless architecture without having our api to be slow, also we learnt how to secure the access to our api gateway and also to our admins.
You can find the project here https://github.com/Depaa/website-blog-part1😉
This series is literally a journey to production, hence you will see a lot of blog posts, here is a list I'll try to keep updated:
- ✅ Serverless infrastructure on AWS for blog website;
- ✅ Backend Api on AWS for blog website;
- ✅ Building a high performing static backoffice on AWS with SvelteKit;
- ✅ Frontend users website for reading blog posts;
- ✅ SEO tweaks for blog website;
- ✅ Analytics and tracking views on our blog website;
- Infrastructure monitoring and alerting to the blog;
- Going live with the blog;
- CICD pipelines for the blog;
- Disaster recovery, RTO and RPO. Going multiregion with serverless;
- … you see, there are a lot planned, and if you want to add more points just DM me or comment right below.
Thank you so much for reading! 🙏 I will keep posting different AWS architecture from time to time so follow me on LinkedIn 👉 https://www.linkedin.com/in/matteo-depascale/.