add bin/json/pull-chapters-data

This commit is contained in:
0x1eef 2022-06-09 02:26:26 -03:00
parent 75537deea7
commit 5dd6efa3d6
3 changed files with 1900 additions and 0 deletions

View file

@ -13,6 +13,9 @@ in English, Farsi, and Portuguese. The contents are made available in JSON, and
This section covers the JSON files. Click [here](#srcsql-directory) to jump to the SQL This section covers the JSON files. Click [here](#srcsql-directory) to jump to the SQL
section. section.
* The [src/json/chapters-data.json](src/json/chapters-data.json) file contains information
about each chapter in The Qur'an.
* The [src/json/ar/](src/json/ar/) directory contains The Qur'an in its original Arabic. * The [src/json/ar/](src/json/ar/) directory contains The Qur'an in its original Arabic.
* The [src/json/en/](src/json/en/) directory contains an English translation of The Qur'an. * The [src/json/en/](src/json/en/) directory contains an English translation of The Qur'an.
@ -21,6 +24,38 @@ section.
* The [src/json/pt/](src/json/pt/) directory contains a Portuguese translation of The Qur'an. * The [src/json/pt/](src/json/pt/) directory contains a Portuguese translation of The Qur'an.
#### Chapters
* [src/json/chapters-data.json](/src/json/chapters-data.json)
The [chapters-data.json](/src/json/chapters-data.json) file was obtained from https://quran.com,
and modified slightly.
The [chapters-data.json](/src/json/chapters-data.json) file contains information about each
chapter in The Qur'an. It is structured as an array of objects, with each object describing
a given chapter. The following example demonstrates how Al-Fatihah is described as an object.
The "codepoints" property is a sequence of unicode codepoints can be mapped back to an Arabic word:
```json
{
"id": "1",
"place_of_revelation": "makkah",
"transliterated_name": "Al-Fatihah",
"translated_name": "The Opener",
"verse_count": 7,
"slug": "al-fatihah",
"codepoints": [
1575,
1604,
1601,
1575,
1578,
1581,
1577
]
},
```
#### Arabic #### Arabic
* [src/json/ar/](src/json/ar/) * [src/json/ar/](src/json/ar/)
@ -319,6 +354,9 @@ contents of the [src/](src/) directory:
* JSON scripts * JSON scripts
* [bin/json/pull-chapters-data](bin/json/pull-chapters-data) <br>
The script is responsible for generating [src/json/chapters-data.json](src/json/chapters-data.json).
* [bin/json/pull-arabic](bin/json/pull-arabic) <br> * [bin/json/pull-arabic](bin/json/pull-arabic) <br>
This script is responsible for populating [src/json/ar/](src/json/ar/). This script is responsible for populating [src/json/ar/](src/json/ar/).

63
bin/json/pull-chapters-data Executable file
View file

@ -0,0 +1,63 @@
#!/usr/bin/env ruby
# frozen_string_literal: true
##
# This script requests information about the chapters of
# The Qur'an from the website https://quran.com, and writes
# the result to src/json/chapters-data.json.
##
# Set process name - primarily for the "ps" command
Process.setproctitle("quran-pull (pull-chapters-data)")
##
# Dependencies
require "bundler/setup"
require "net/http"
require "nokogiri"
require "json"
require "paint"
##
# Configuration variables.
base_uri = "quran.com"
dest = File.join('src', 'json', 'chapters-data.json')
##
# Share a single Net::HTTP instance.
http = Net::HTTP.new(base_uri, 443)
http.use_ssl = true
##
# Utils
request_chapters = ->(path) do
res = http.request_get(path)
json = Nokogiri.HTML(res.body)
.css("script[type='application/json']")
.inner_text
JSON.parse(json).dig('props', 'pageProps', 'chaptersData')
end
##
# Chapter data
en_chapters = request_chapters.call('/')
ar_chapters = request_chapters.call('/ar')
##
# Parse chapter data
parsed = en_chapters.map do
{
id: _1,
place_of_revelation: _2['revelationPlace'],
transliterated_name: _2['transliteratedName'],
translated_name: _2['translatedName'],
verse_count: _2['versesCount'],
slug: _2['slug'],
codepoints: ar_chapters[_1]['translatedName'].codepoints,
}
end
##
# Write result
File.write(dest, JSON.pretty_generate(parsed))
print Paint["OK: ", :green, :bold], dest, "\n"

1799
src/json/chapters-data.json Normal file

File diff suppressed because it is too large Load diff