Going Serverless: My Linear Regression API on AWS Lambda

Logan Cooper
The Startup
Published in
7 min readSep 4, 2020

--

About a year ago, I had a fun idea for a small side project. I wanted to build an API to do a simple linear regression on two series of data. If you don’t know what a linear regression is, it’s basically a way of calculating the correlation between two variables — it’s not super important to understanding this post, but if you want to learn more, you can get an idea of what it is here.

A scatter plot with a red regression line drawn through it
Basically it’s how you find the red line

It was a nice way to refresh some of my econometrics knowledge from college. Since I also wanted to improve my Python skills, I chose to do it in Python. Specifically, I used Flask to turn my regression code into an API. After getting Flask set up and my regression code hooked up to it, I wrote a quick few tests, slapped together a simple frontend with some HMTL, CSS, and JavaScript (vanilla for most of it, D3 for some charting).

When I was done and it came time to deploy my code, my first instinct was to reach for Heroku, because that was the way I had deployed previous projects, and that’s what I did at first. However, at the time I was in the middle of a job search, and I was noticing a lot of job listings asking for some AWS experience, so I decided to dip my toe in deploying my regression project there. Googling “deploying a flask app to AWS” brought up the official docs on deploying a flask app to Elastic Beanstalk.

Having never worked with AWS before, it was tough getting everything set up, but I was able to push through and get my code up and running. I could go to my messy AWS URL and see the project I had built. That was a very exciting day. However, at that point my project was pretty much done, and I put it in the back of my mind.

Fast forward to this August. I’ve got a new job that wants me to get AWS certified. Awesome! I take some lessons and pass my Certified Cloud Practitioner exam. Even more awesome! While I was studying up for it, I found myself thinking back to what I had done to deploy my project. As (relatively) straightforward as deploying to Elastic Beanstalk was, it may have been overkill. I didn’t need a lot of the features that it gives me. It may have been more straightforward to deploy it directly to EC2. Then I learned about Lambda.

AWS Lambda Logo
Ooh. Ah! Lambda!

Lambda is AWS’ main serverless offering. It abstracts away provisioning and running servers to the point that you can pretty much upload your code and have it run when you need it to. That’s where I put the code that actually runs the regression. That sits behind a service called API Gateway, which lets you (among other things) run your Lambda functions in response to HTTP calls. In my case, my regression function runs in response to a POST call with two arrays of numbers in the body.

I was able to get this working by following along with the official AWS guide for building an API with Lambda. Getting the lambda function working was pretty easy: I just pasted my old regression code in and wrapped it in some of the Flask code to mediate between the HTTP call and the regression function. After tweaking some variables, I had a working AWS Lambda function.

The tough part came from hooking the Lambda function up to API gateway. Actually, hooking it up was pretty easy. I was able to follow the AWS docs to get my Lambda function to respond to a POST. The hard part was getting it to respond in a meaningful way.

AWS API Gateway Logo
The next step: API Gateway

You see, my old code had a couple of branches that the regression code could go along depending on what the request looked like. If the data was good, it created a Python dictionary with a statusCode key with a value of 200, and the whole thing got wrapped up in json.dumps so that Flask could send the response back to the browser as a JSON string. AWS didn’t like that. It turns out, API Gateway expects the whole response object (with keys statusCode, body, and headers) to just be a dictionary. The contents of the body key however, do need to be a JSON string. So my code went from:

# ind and dep are the two variables from the HTTP request
return json.dumps({
statusCode: 200,
body: regression(ind, dep)
})

To this:

return {
statusCode: 200,
body: json.dumps(regression(ind,dep))
}

So, that got my tests working for the API, but I still had to ensure I could get that response from my own website, not just from AWS’ test service. As a first step, I took my testing to Postman, my API development tool of choice. Using my API Gateway URL and some sample data from my original Python unit tests, I sent a request to my API URL, and got back… a 403 with the body:

{    "message": "Missing Authentication Token"}

I went to Google. I verified again that I had set this API up as a public API: I had. Then I saw a suggestion that the “Missing Authentication Token” error shows up when one tries to send the wrong HTTP method. I checked, and in fact, I was trying to do a GET on my API that only accepts POSTs. Why the error for that is "Missing Authentication Token" and not something more descriptive is anyone’s guess. Still, changing that to a POST got me:

A Postman screenshot with the error text “”message”: “Missing Authentication Token””
Did you guess the same error?

That’s right, I was still getting the same error while sending a POST. I think at that point I had some discourteous words for my computer and took a break for a little bit. When I came back to my project, I combed through the resources I had been consulting before — AWS Docs, StackOverflow questions, Medium articles — and couldn’t figure out what I was doing wrong. I had done every step that they suggested. I’d dotted all my i’s and crossed all my t’s, but still it wasn’t working.

Or had I? While looking at a StackOverflow question for the dozenth time, I finally saw something in the URL. You see, API Gateway gives you a “Invoke URL” to access your API. It should look something like: https://abunchofrandomletters.execute-api.us-west-1.amazonaws.com/environment. I had thought that was where I had to go to access my API. However, the person asking the question on SO was sending their request to https://abunchofrandomletters.execute-api.us-west-1.amazonaws.com/environment/resource. See that /resource on the end there? That’s the name of the API Gateway resource that they set up initially. I had one of those too, called /regress, so I thought I’d try tacking that on to the end of my route.

A successful Postman call
HA! It works!

That was it. Nowhere in the tutorial I was following was that mentioned, but in retrospect, it makes sense that I would have had to do that. With that working, I could move on to finally integrating my API with my frontend.

Since I wanted to access the frontend through my personal website, the first step was taking the HTML, CSS, and JavaScript I had been serving with my Flask server, and put it in GitHub Pages. That was easy, just an mv command, a git commit, and a couple of button presses in GitHub. The next step was somewhat harder: getting my frontend to talk to my API. I started by finding the function that was sending off the uploaded CSV, and swapping out my old Flask route for my new API Gateway.

I loaded some data in, sent off my request, and it failed. Not even a nice 400 or 500 error. Just that nasty (failed) message in my Network tab. Since I couldn’t see just what was going on, it was time to use another one of the services I’d been learning about: CloudWatch.

AWS CloudWatch Logo
I think this logo is the metrics you see. Or maybe the cloud you’re watching?

CloudWatch is one of AWS’ monitoring services. It can be used to monitor performance, and even send alerts if there’s something wrong with your applications. For my purposes though, I was interested in the logging functionality. I turned on logging, and was able to track down the problem. It was CORS. Now, CORS has derailed a personal project of mine more than once. However, in those cases, I didn’t usually control both the browser and the server (or rather, the lack of a server). A quick Access-Control-Allow-Origin header in my Lambda function, and I had a working product. I could go to my site, upload a CSV, and get a linear regression.

A screenshot of my regression app
The final product

Go ahead, check it out!

--

--

Logan Cooper
The Startup

I'm a data scientist and software engineer with an MS in Economics and CS from Duke University, currently working as a Senior Analyst at EBP US.