Console utility for extracting links (and links from links) from a webpage recursively. The project was created mainly for learning purposes, fun and exploration of what's possible with modern C++ standards and features.
Throughout the journey I went through:
- proper way to write multithreaded programs
- atomic variables and operations
- different synchronization mechanisms for multithreaded programs (like mutexes, semaphores, condition variables)
- writing my own threadpool implementation
- basic and advanced SQL operations (PostgreSQL via SOCI)
- libcurl
- integrating python into C++ (and the other way around) using standard C Python bindings and pybind11
- regular expressions (regex)
- like every decent console program has input flags to customize number of threads, parsing depth, etc.. Simply run it
./build/projectAormake runto see the prompt - includes possibility to write out parsed link data to the database
- you may want to uncomment a block of code in
main.cppto have a list of all links alive being put to the console
I assume you have already installed postgres with package manager, initialized the cluster, launched the database service, and have a dedicated user (in my case user1) on your local machine. There are lots of guides how to do that specifically for you on the Internet.
To make it actually usable you should go through a few steps:
- connect to the database with "root" privileges
sudo -u postgres psql - create a database in postgres prompt
CREATE DATABASE projectA; - check if database was created successfully
SELECT datname FROM pg_database; - grant your user rights to connect to it
GRANT CONNECT ON DATABASE projectA TO user1; - connect to newly created database
\c projecta - allow some operations to user in the database
GRANT CREATE ON SCHEMA public TO user1;GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO user1; - exit psql with
\q - run the
./build/projectAwith proper flags and have database populated - connect back to the sql with your user
psql -U user1 -d db1 -h localhost -p 5432 - see the data in your table there
SELECT * FROM your_new_table
If you ran into some issues by any chance or need to contact the developer, it would be great to recieve your valuable feedback on email: bilenko.a.uni@gmail.com.
