devlog 9/12/2024

Project elements:

Web scraping
- Spent this morning setting up a web scraper using Beautiful Soup (BS).
n8n
- n8n seems to be working, but…
- Getting a lot of errors trying to work with AI agents
- Currently still slower than just programming it from scratch
Ghost
- Working fine now that it’s configured.
NGINX Proxy Manager
- Still not working on railway.app. Template must be misconfigured.

Spent this morning setting up a web scraper. I used Gemini Pro 1.5 Experimental to generate the script and then in about 5-6 turns, had something working very well.

To make this work, you need an LLM with a huge context window that can read one or more full HTML files to pluck out the CSS selectors that BS uses to extract data from the HTML. Some of these pages are upward of 400,000 tokens, so Gemini 1.5 Pro’s 2 million token limit was essential.

It outputs to a single JSONL file which is nice and clean.

Be First to Comment

Leave a Reply Cancel reply