Part of the CLDR to Markdown Conversion Process, aiming to automate steps 1-3.
NOTE: does not get rid of all manual work, images, tables, and general review are still required.
Objective: this file aims to correct some of the common mistakes that show up when using a html to markdown converter on the google sites CLDR site. It is not a comprehensive list, and there can still be mistakes, but it helps to correct some of the consistently seen errors that show up, particularly with the specific markdown converter used in pullFromCLDR.py. Most of the adjustments utilize regular expressions to find and replace specific text. The functions are as follows:
Objective: this file is used along side cleanup.py to automate the process of pulling html and text from a given CLDR page. It uses libraries to retrieve the htmal as well as plain text from a given page, convert the html into markdown, parse the markdown using the cleanup.py file, and create the .md file and the temporary .txt file in the cldr site location. There are a couple of things to note with this:
To run this code, you must have python3 installed. You need to install the following Python libraries:
bs4
)You can install them using pip:
pip install beautifulsoup4 markdownify requests
Line 8 of cleanup.py should contain the url that will be appended to the start of all relative links (always https://cldr.unicode.org):
#head to place at start of all relative links RELATIVE_LINK_HEAD = "https://cldr.unicode.org"
Line 7 of pullFromCLDR.py should contain your local location of the cloned CLDR site, this is where the files will be stored:
#LOCAL LOCATION OF CLDR CLDR_SITE_LOCATION = "DIRECTORY TO CLDR LOCATION/docs/site"
Before running, ensure that the folders associated to the directory of the page you are trying to convert are within your cldr site directory, and there is a folder named TEMP-TEXT-FILES.
Run with:
python3 pullFromCLDR.py
You will then be prompted to enter the url of the site you are trying to convert, after which the script will run.
If you would like to run unit tests on cleanup, or use any of the functions indiviually, run
python3 cleanup.py