I have a python script I’ve been running at home. I queries ~200 web pages to check for updates in any of my various work groups. The system only provides hourly emails - I want notification on 10 minute intervals - so I wrote a polling script.
It uses aiohttp and asyncio pretty heavily to do the intial session login and set up credentials and then farms out and harvests all the 200 requests into a list of groups with activity - if any.
When I ported this to Lambda - I had to cut out all the asyncio and use straight synchronous requests. The Lambda function timed out after 6s… So I’m not doing this right.
Should I be thinking about this by:
having the initial trigger check for a valid login session - otherwise creates one
creates a loop for all active teams and triggers a different Lambda function to check each group
some kind of harvesting function that is triggered once all the groups are finished
Is this the serverless way to architect this?
How would 3) get triggered?
Thanks for helping out! Cheers, jas…
Questions:
1.) How long is the overall
execution time
when you run your script at home?
2.) What ist the
error message
when it fails on aws lambda?
3.) Do you use any non-standard python libraries, for example
requests
?
4.) Which
runtime
did you choose?
5.) How did you
trigger
your lambda function? schedule? api gateway?
Some notes:
1.) The asyncio stuff should run on aws lambda - because its python. But of course you have to use
runtime: python3.6
2.) The
timeout
property for the function should be set appropriately - otherwise timeout can occur before the function finishes. Timeout can be set
up to 5 minutes
for a lambda function.
3.) If you use non-standard python libraries, you maybe forgot to deploy them?
Just my two cents…
Questions:
1.) How long is the overall
execution time
when you run your script at home?
2.) What ist the
error message
when it fails on aws lambda?
3.) Do you use any non-standard python libraries, for example
requests
?
4.) Which
runtime
did you choose?
5.) How did you
trigger
your lambda function? schedule? api gateway?
Hey Franky! Thanks for grabbing ahold of this. Here’s some answers:
around 60s - most of that is waiting for 700 or so http: requests to come back
the error is execution time is too long - 6s?? - I think
Yes, here are the imports for the synchronous version:
import json
import requests
import boto3
import itertools
from pprint import pprint
from datetime import datetime, timedelta
from bs4 import BeautifulSoup
python3.6
right now I just hit the test button
Some notes:
1.) The asyncio stuff should run on aws lambda - because its python. But of course you have to use
runtime: python3.6
2.) The
timeout
property for the function should be set appropriately - otherwise timeout can occur before the function finishes. Timeout can be set
up to 5 minutes
for a lambda function.
3.) If you use non-standard python libraries, you maybe forgot to deploy them?
Thanks for these. Here’s my replies:
my serverless.yml has:
provider:
name: aws
runtime: python3.6
so I think the runtime is fine. The real problem was I had no idea how to write the loop that handle all the asyncio stuff. The standalone script has the following:
I have no idea how to encode that loop in a lambda handler. I’m very new to Lambda
I have not set the timeout property - so I’m guessing the default is 6s - since that’s the error I get.
I’m using sls deploy at the moment and together with docker it’s uploading a 6Mb .zip file each time (why is it so big???) - which makes it really hard for me to test using my unstable rural internet connection.
Thanks for your time, jas…
Add the property
timeout
to your function in serverless.yml:
timeout: 120 # Sets the timeout to 120 seconds, maximum possible value is 300 = 5 minutes
Then the maximum execution time of your lambda is 120 seconds and it should work…
See example in docs: