Squashed 'actes-princiers/data/01_raw/' changes from a07c038..9f9dd4d

9f9dd4d correction brb_lo_ii_1365_08_08a.xml

git-subtree-dir: actes-princiers/data/01_raw
git-subtree-split: 9f9dd4d3d67a6a85e244b16071e693fb5b3cdebf
develop
gwen 2 years ago
parent 42d90def46
commit 9161c489f9

7
.gitignore vendored

@ -1 +1,6 @@
.venv
/tex-pdf
/xsl
compilation_latex.ipynb
.ipynb_checkpoints/
*/.ipynb_checkpoints/
.venv/

@ -1,66 +1,92 @@
# Actes princiers -- data transformations
# data
## Project Name
human readable name : `Actes Princiers`
The project name 'Actes Princiers' has been applied to:
## Getting started
- The project title in `datascience/actes-princiers/README.md`
- The folder created for your project in `datascience/actes-princiers`
- The project's python package in `datascience/actes-princiers/src/actes_princiers`
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
A best-practice setup includes initialising git and creating a virtual environment before running 'pip install -r src/requirements.txt'
Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
## Getting started
## Add your files
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
```
cd existing_repo
git remote add origin https://gitlab.huma-num.fr/medieval-acts/princely-acts/data.git
git branch -M main
git push -uf origin main
```
## Integrate with your tools
- [ ] [Set up project integrations](https://gitlab.huma-num.fr/medieval-acts/princely-acts/data/-/settings/integrations)
- Install a virtual environment : `python -m venv .venv`
- Enable the virtual environment : `source .venv/bin/activate`
- install kedro `pip install kedro`
- Install the packages and libraries `pip install -r src/requirements.txt`
## Collaborate with your team
**go to `actes-princiers`'s folder**
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Set auto-merge](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
Then open a terminal in the `actes-princiers`'s folder
and launch jupyter : `kedro jupyter notebook`
or start the ipython prompt : `kedro ipython`
## Test and Deploy
## Launching the pipelines
Use the built-in continuous integration in GitLab.
**go to `actes-princiers`'s folder**
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
Open a terminal in the `actes-princiers`'s folder and launch kedro
***
`kedro run`
# Editing this README
or launch a specific node in the pipeline with:
When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
`kedro run --nodes=<node_name>`
## Suggestions for a good README
Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
or a search by tags with:
## Name
Choose a self-explaining name for your project.
`kedro run --tags=<tag_name>`
## Description
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
The current tags are:
## Badges
On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
- `kedro run --tags="etl_transform"`: launches the XML to JSON transformations
- `kedro run --tags="populate_database"`: populates the mongodb distant database
on the target server
## Visuals
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
## Visualizing the pipelines
## Installation
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
**you shall install kedro-viz before**
## Usage
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
install kedro viz with
## Support
Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
`pip install kedro-viz`
## Roadmap
If you have ideas for releases in the future, it is a good idea to list them in the README.
Then launch the command
## Contributing
State if you are open to contributions and what your requirements are for accepting them.
`kedro viz`
For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
## tips
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
You need to reload Kedro variables by calling `%reload_kedro` in your notebook and re-run the code snippet
## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.
## License
For open source projects, say how it is licensed.
## Project status
If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.

@ -1,157 +0,0 @@
##########################
# KEDRO PROJECT
# ignore all local configuration
conf/local/**
!conf/local/.gitkeep
.telemetry
# ignore potentially sensitive credentials files
conf/**/*credentials*
# ignore everything in the following folders
# data/**
logs/**
# except their sub-folders
#!data/**/
!logs/**/
# also keep all .gitkeep files
!.gitkeep
##########################
# Common files
# IntelliJ
.idea/
*.iml
out/
.idea_modules/
### macOS
*.DS_Store
.AppleDouble
.LSOverride
.Trashes
# Vim
*~
.*.swo
.*.swp
# emacs
*~
\#*\#
/.emacs.desktop
/.emacs.desktop.lock
*.elc
# JIRA plugin
atlassian-ide-plugin.xml
# C extensions
*.so
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
.static_storage/
.media/
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
.ipython/profile_default/history.sqlite
.ipython/profile_default/startup/README
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# mkdocs documentation
/site
# mypy
.mypy_cache/

@ -1,23 +0,0 @@
# What is this for?
This folder should be used to store configuration files used by Kedro or by separate tools.
This file can be used to provide users with instructions for how to reproduce local configuration with their own credentials. You can edit the file however you like, but you may wish to retain the information below and add your own section in the [Instructions](#Instructions) section.
## Local configuration
The `local` folder should be used for configuration that is either user-specific (e.g. IDE configuration) or protected (e.g. security keys).
> *Note:* Please do not check in any local configuration to version control.
## Base configuration
The `base` folder is for shared configuration, such as non-sensitive and project-related configuration that may be shared across team members.
WARNING: Please do not put access credentials in the base configuration folder.
## Instructions
## Need help?
[Find out more about configuration from the Kedro documentation](https://docs.kedro.org/en/stable/configuration/index.html).

@ -1,142 +0,0 @@
# houses and princes
# input (read only) datasets
houses:
type: yaml.YAMLDataSet
filepath: data/01_raw/yaml/houses.yaml
houses_trigram:
type: json.JSONDataSet
filepath: data/01_raw/json/house_trigram.json
prince_bigram:
type: json.JSONDataSet
filepath: data/01_raw/json/prince_bigram.json
# ________________________________________________________________________
# BOURBON
# input (read only) dataset
bourbon:
type: actesdataset.XMLDataSetCollection
housename: bourbon
folderpath: data/01_raw/xml/Bourbon
xsltstylesheet: templates/xsl/actes_princiers.xsl
# input (read only) dataset
bourbon_json:
type: actesdataset.BsXMLDataSetCollection
housename: bourbon
folderpath: data/01_raw/xml/Bourbon
# _________________________________________________________________________
# output (write) **pseudo xml** dataset
bourbon_xmlcontent:
type: actesdataset.XMLDataSetCollection
housename: bourbon
folderpath: data/02_intermediate/Bourbon/pseudoxml
xsltstylesheet: templates/xsl/actes_princiers.xsl
# input (read) **pseudo xml** dataset
# as it is **not** regular xml, an xml loader cannot be used
bourbon_pseudoxmlcontent:
type: actesdataset.TextDataSetCollection
housename: bourbon
folderpath: data/02_intermediate/Bourbon/pseudoxml
# input (read) and output (write) dataset
bourbon_jsonoutput:
type: actesdataset.JSONDataSetCollection
housename: bourbon
folderpath: data/02_intermediate/Bourbon/json
# output (write) and input (read) dataset
bourbon_fulljsonoutput:
type: actesdataset.JSONDataSetCollection
housename: bourbon
folderpath: data/02_intermediate/Bourbon/fulljson
# ________________________________________________________________________
# BERRY
# input (read only) dataset
berry:
type: actesdataset.XMLDataSetCollection
housename: berry
folderpath: data/01_raw/xml/Berry
xsltstylesheet: templates/xsl/actes_princiers.xsl
# input (read only) dataset
berry_json:
type: actesdataset.BsXMLDataSetCollection
housename: berry
folderpath: data/01_raw/xml/Berry
# _________________________________________________________________________
# output (write) **pseudo xml** dataset
berry_xmlcontent:
type: actesdataset.XMLDataSetCollection
housename: berry
folderpath: data/02_intermediate/Berry/pseudoxml
xsltstylesheet: templates/xsl/actes_princiers.xsl
# input (read) **pseudo xml** dataset
# as it is **not** regular xml, an xml loader cannot be used
berry_pseudoxmlcontent:
type: actesdataset.TextDataSetCollection
housename: berry
folderpath: data/02_intermediate/Berry/pseudoxml
# input (read) and output (write) dataset
berry_jsonoutput:
type: actesdataset.JSONDataSetCollection
housename: berry
folderpath: data/02_intermediate/Berry/json
# output (write) and input (read) dataset
berry_fulljsonoutput:
type: actesdataset.JSONDataSetCollection
housename: berry
folderpath: data/02_intermediate/Berry/fulljson
# ________________________________________________________________________
# ANJOU
# input (read only) dataset
anjou:
type: actesdataset.XMLDataSetCollection
housename: anjou
folderpath: data/01_raw/xml/Anjou
xsltstylesheet: templates/xsl/actes_princiers.xsl
# input (read only) dataset
anjou_json:
type: actesdataset.BsXMLDataSetCollection
housename: anjou
folderpath: data/01_raw/xml/Anjou
# _________________________________________________________________________
# output (write) **pseudo xml** dataset
anjou_xmlcontent:
type: actesdataset.XMLDataSetCollection
housename: anjou
folderpath: data/02_intermediate/Anjou/pseudoxml
xsltstylesheet: templates/xsl/actes_princiers.xsl
# input (read) **pseudo xml** dataset
# as it is **not** regular xml, an xml loader cannot be used
anjou_pseudoxmlcontent:
type: actesdataset.TextDataSetCollection
housename: anjou
folderpath: data/02_intermediate/Anjou/pseudoxml
# input (read) and output (write) dataset
anjou_jsonoutput:
type: actesdataset.JSONDataSetCollection
housename: anjou
folderpath: data/02_intermediate/Anjou/json
# output (write) and input (read) dataset
anjou_fulljsonoutput:
type: actesdataset.JSONDataSetCollection
housename: anjou
folderpath: data/02_intermediate/Anjou/fulljson

@ -1,41 +0,0 @@
version: 1
disable_existing_loggers: False
formatters:
simple:
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: simple
stream: ext://sys.stdout
info_file_handler:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: logs/info.log
maxBytes: 10485760 # 10MB
backupCount: 20
encoding: utf8
delay: True
rich:
class: kedro.logging.RichHandler
rich_tracebacks: True
# Advance options for customisation.
# See https://docs.kedro.org/en/stable/logging/logging.html#project-side-logging-configuration
# tracebacks_show_locals: False
loggers:
kedro:
level: INFO
actes_princiers:
level: INFO
root:
handlers: [rich, info_file_handler]

@ -1,3 +0,0 @@
version: 1.0
db_name: actesdb
db_collection_name: actes

@ -1,6 +0,0 @@
/tex-pdf
/xsl
compilation_latex.ipynb
.ipynb_checkpoints/
*/.ipynb_checkpoints/
.venv/

@ -1,92 +0,0 @@
# data
## Getting started
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
## Add your files
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
```
cd existing_repo
git remote add origin https://gitlab.huma-num.fr/medieval-acts/princely-acts/data.git
git branch -M main
git push -uf origin main
```
## Integrate with your tools
- [ ] [Set up project integrations](https://gitlab.huma-num.fr/medieval-acts/princely-acts/data/-/settings/integrations)
## Collaborate with your team
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Set auto-merge](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
## Test and Deploy
Use the built-in continuous integration in GitLab.
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
***
# Editing this README
When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
## Suggestions for a good README
Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
## Name
Choose a self-explaining name for your project.
## Description
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
## Badges
On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
## Visuals
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
## Installation
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
## Usage
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
## Support
Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
## Roadmap
If you have ideas for releases in the future, it is a good idea to list them in the README.
## Contributing
State if you are open to contributions and what your requirements are for accepting them.
For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.
## License
For open source projects, say how it is licensed.
## Project status
If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.

@ -1,620 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "aeacd24e",
"metadata": {},
"source": [
"# Catalogs\n",
"\n",
"## Chargement des actors"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ae9bc24c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">&lt;module&gt;</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1 catalog <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">2 </span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">dir</span>(catalog) <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">3 </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">NameError: </span>name <span style=\"color: #008000; text-decoration-color: #008000\">'catalog'</span> is not defined\n",
"</pre>\n"
],
"text/plain": [
"\u001b[31m╭─\u001b[0m\u001b[31m──────────────────────────────\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m \u001b[0m\u001b[31m───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
"\u001b[31m│\u001b[0m in \u001b[92m<module>\u001b[0m:\u001b[94m1\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1 catalog \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m2 \u001b[0m\u001b[96mdir\u001b[0m(catalog) \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m3 \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
"\u001b[1;91mNameError: \u001b[0mname \u001b[32m'catalog'\u001b[0m is not defined\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"catalog\n",
"dir(catalog)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "40417f25",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[06/30/23 17:50:49] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO </span> Loading data from <span style=\"color: #008000; text-decoration-color: #008000\">'xmlreflector'</span> <span style=\"font-weight: bold\">(</span>XMLHousesReflector<span style=\"font-weight: bold\">)</span><span style=\"color: #808000; text-decoration-color: #808000\">...</span> <a href=\"file:///home/gwen/.local/lib/python3.10/site-packages/kedro/io/data_catalog.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">data_catalog.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///home/gwen/.local/lib/python3.10/site-packages/kedro/io/data_catalog.py#345\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">345</span></a>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[2;36m[06/30/23 17:50:49]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Loading data from \u001b[32m'xmlreflector'\u001b[0m \u001b[1m(\u001b[0mXMLHousesReflector\u001b[1m)\u001b[0m\u001b[33m...\u001b[0m \u001b]8;id=287074;file:///home/gwen/.local/lib/python3.10/site-packages/kedro/io/data_catalog.py\u001b\\\u001b[2mdata_catalog.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=134334;file:///home/gwen/.local/lib/python3.10/site-packages/kedro/io/data_catalog.py#345\u001b\\\u001b[2m345\u001b[0m\u001b]8;;\u001b\\\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/home/gwen/.local/lib/python3.10/site-packages/kedro/io/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">core.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">187</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">184 │ │ </span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._logger.debug(<span style=\"color: #808000; text-decoration-color: #808000\">\"Loading %s\"</span>, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">str</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>)) <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">185 │ │ </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">186 │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>: <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>187 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│ │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._load() <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">188 │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> DataSetError: <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">189 │ │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">190 │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">Exception</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> exc: <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/actes-princiers/src/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">actesdataset</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">62</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_load</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 59 │ │ </span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.filepath = filepath <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 60 │ </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 61 │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_load</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>): <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 62 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #808000; text-decoration-color: #808000\">\"C'est chargé!\"</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 63 │ </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 64 │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_save</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>): <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 65 │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">NotImplementedError</span>(<span style=\"color: #808000; text-decoration-color: #808000\">\"Attention : dataset en lecture seule !\"</span>) <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">TypeError: </span>exceptions must derive from BaseException\n",
"\n",
"<span style=\"font-style: italic\">The above exception was the direct cause of the following exception:</span>\n",
"\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/tmp/ipykernel_28884/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">4226322454.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">&lt;module&gt;</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000; font-style: italic\">[Errno 2] No such file or directory: '/tmp/ipykernel_28884/4226322454.py'</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/home/gwen/.local/lib/python3.10/site-packages/kedro/io/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">data_catalog.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">349</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">346 │ │ │ </span><span style=\"color: #808000; text-decoration-color: #808000\">\"Loading data from '%s' (%s)...\"</span>, name, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">type</span>(dataset).<span style=\"color: #ff0000; text-decoration-color: #ff0000\">__name__</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">347 │ │ </span>) <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">348 │ │ </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>349 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│ │ </span>result = dataset.load() <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">350 │ │ </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">351 │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> result <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">352 </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/home/gwen/.local/lib/python3.10/site-packages/kedro/io/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">core.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">196</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">193 │ │ │ </span>message = ( <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">194 │ │ │ │ </span><span style=\"color: #808000; text-decoration-color: #808000\">f\"Failed while loading data from data set {</span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">str</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>)<span style=\"color: #808000; text-decoration-color: #808000\">}.\\n{</span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">str</span>(exc)<span style=\"color: #808000; text-decoration-color: #808000\">}\"</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">195 │ │ │ </span>) <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>196 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│ │ │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> DataSetError(message) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">from</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline\">exc</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">197 │ </span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">198 │ </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">save</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, data: _DI) -&gt; <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>: <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">199 │ │ </span><span style=\"color: #808000; text-decoration-color: #808000\">\"\"\"Saves data by delegation to the provided save method.</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
"<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
"<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">DataSetError: </span>Failed while loading data from data set <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">XMLHousesReflector</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #800080; text-decoration-color: #800080\">my</span> own dataset<span style=\"font-weight: bold\">)</span>.\n",
"exceptions must derive from BaseException\n",
"</pre>\n"
],
"text/plain": [
"\u001b[31m╭─\u001b[0m\u001b[31m────────────────────────────── \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m ───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2;33m/home/gwen/.local/lib/python3.10/site-packages/kedro/io/\u001b[0m\u001b[1;33mcore.py\u001b[0m:\u001b[94m187\u001b[0m in \u001b[92mload\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m184 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[96mself\u001b[0m._logger.debug(\u001b[33m\"\u001b[0m\u001b[33mLoading \u001b[0m\u001b[33m%s\u001b[0m\u001b[33m\"\u001b[0m, \u001b[96mstr\u001b[0m(\u001b[96mself\u001b[0m)) \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m185 \u001b[0m\u001b[2m│ │ \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m186 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[94mtry\u001b[0m: \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m187 \u001b[2m│ │ │ \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._load() \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m188 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[94mexcept\u001b[0m DataSetError: \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m189 \u001b[0m\u001b[2m│ │ │ \u001b[0m\u001b[94mraise\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m190 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mException\u001b[0m \u001b[94mas\u001b[0m exc: \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2;33m/media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/actes-princiers/src/\u001b[0m\u001b[1;33mactesdataset\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[1;33m.py\u001b[0m:\u001b[94m62\u001b[0m in \u001b[92m_load\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m 59 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[96mself\u001b[0m.filepath = filepath \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m 60 \u001b[0m\u001b[2m│ \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m 61 \u001b[0m\u001b[2m│ \u001b[0m\u001b[94mdef\u001b[0m \u001b[92m_load\u001b[0m(\u001b[96mself\u001b[0m): \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 62 \u001b[2m│ │ \u001b[0m\u001b[94mraise\u001b[0m \u001b[33m\"\u001b[0m\u001b[33mC\u001b[0m\u001b[33m'\u001b[0m\u001b[33mest chargé!\u001b[0m\u001b[33m\"\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m 63 \u001b[0m\u001b[2m│ \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m 64 \u001b[0m\u001b[2m│ \u001b[0m\u001b[94mdef\u001b[0m \u001b[92m_save\u001b[0m(\u001b[96mself\u001b[0m): \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m 65 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[94mraise\u001b[0m \u001b[96mNotImplementedError\u001b[0m(\u001b[33m\"\u001b[0m\u001b[33mAttention : dataset en lecture seule !\u001b[0m\u001b[33m\"\u001b[0m) \u001b[31m│\u001b[0m\n",
"\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
"\u001b[1;91mTypeError: \u001b[0mexceptions must derive from BaseException\n",
"\n",
"\u001b[3mThe above exception was the direct cause of the following exception:\u001b[0m\n",
"\n",
"\u001b[31m╭─\u001b[0m\u001b[31m────────────────────────────── \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m ───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2;33m/tmp/ipykernel_28884/\u001b[0m\u001b[1;33m4226322454.py\u001b[0m:\u001b[94m1\u001b[0m in \u001b[92m<module>\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[3;31m[Errno 2] No such file or directory: '/tmp/ipykernel_28884/4226322454.py'\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2;33m/home/gwen/.local/lib/python3.10/site-packages/kedro/io/\u001b[0m\u001b[1;33mdata_catalog.py\u001b[0m:\u001b[94m349\u001b[0m in \u001b[92mload\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m346 \u001b[0m\u001b[2m│ │ │ \u001b[0m\u001b[33m\"\u001b[0m\u001b[33mLoading data from \u001b[0m\u001b[33m'\u001b[0m\u001b[33m%s\u001b[0m\u001b[33m'\u001b[0m\u001b[33m (\u001b[0m\u001b[33m%s\u001b[0m\u001b[33m)...\u001b[0m\u001b[33m\"\u001b[0m, name, \u001b[96mtype\u001b[0m(dataset).\u001b[91m__name__\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m347 \u001b[0m\u001b[2m│ │ \u001b[0m) \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m348 \u001b[0m\u001b[2m│ │ \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m349 \u001b[2m│ │ \u001b[0mresult = dataset.load() \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m350 \u001b[0m\u001b[2m│ │ \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m351 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[94mreturn\u001b[0m result \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m352 \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2;33m/home/gwen/.local/lib/python3.10/site-packages/kedro/io/\u001b[0m\u001b[1;33mcore.py\u001b[0m:\u001b[94m196\u001b[0m in \u001b[92mload\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m193 \u001b[0m\u001b[2m│ │ │ \u001b[0mmessage = ( \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m194 \u001b[0m\u001b[2m│ │ │ │ \u001b[0m\u001b[33mf\u001b[0m\u001b[33m\"\u001b[0m\u001b[33mFailed while loading data from data set \u001b[0m\u001b[33m{\u001b[0m\u001b[96mstr\u001b[0m(\u001b[96mself\u001b[0m)\u001b[33m}\u001b[0m\u001b[33m.\u001b[0m\u001b[33m\\n\u001b[0m\u001b[33m{\u001b[0m\u001b[96mstr\u001b[0m(exc)\u001b[33m}\u001b[0m\u001b[33m\"\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m195 \u001b[0m\u001b[2m│ │ │ \u001b[0m) \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m196 \u001b[2m│ │ │ \u001b[0m\u001b[94mraise\u001b[0m DataSetError(message) \u001b[94mfrom\u001b[0m \u001b[4;96mexc\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m197 \u001b[0m\u001b[2m│ \u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m198 \u001b[0m\u001b[2m│ \u001b[0m\u001b[94mdef\u001b[0m \u001b[92msave\u001b[0m(\u001b[96mself\u001b[0m, data: _DI) -> \u001b[94mNone\u001b[0m: \u001b[31m│\u001b[0m\n",
"\u001b[31m│\u001b[0m \u001b[2m199 \u001b[0m\u001b[2m│ │ \u001b[0m\u001b[33m\"\"\"Saves data by delegation to the provided save method.\u001b[0m \u001b[31m│\u001b[0m\n",
"\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
"\u001b[1;91mDataSetError: \u001b[0mFailed while loading data from data set \u001b[1;35mXMLHousesReflector\u001b[0m\u001b[1m(\u001b[0m\u001b[33mname\u001b[0m=\u001b[35mmy\u001b[0m own dataset\u001b[1m)\u001b[0m.\n",
"exceptions must derive from BaseException\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"catalog.load(\"xmlreflector\")"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "dc290e93",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[06/16/23 15:56:44] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO </span> Loading data from <span style=\"color: #008000; text-decoration-color: #008000\">'actors'</span> <span style=\"font-weight: bold\">(</span>CSVDataSet<span style=\"font-weight: bold\">)</span><span style=\"color: #808000; text-decoration-color: #808000\">...</span> <a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">data_catalog.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">345</span></a>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[2;36m[06/16/23 15:56:44]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Loading data from \u001b[32m'actors'\u001b[0m \u001b[1m(\u001b[0mCSVDataSet\u001b[1m)\u001b[0m\u001b[33m...\u001b[0m \u001b]8;id=858812;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\u001b\\\u001b[2mdata_catalog.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=44255;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\u001b\\\u001b[2m345\u001b[0m\u001b]8;;\u001b\\\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>NAME</th>\n",
" <th>ROLE</th>\n",
" <th>HOUSE</th>\n",
" <th>DATE1</th>\n",
" <th>DATE2</th>\n",
" <th>DATE3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Charles Ier de Bourbon</td>\n",
" <td>prince</td>\n",
" <td>Bourbon</td>\n",
" <td>1400</td>\n",
" <td>1434.0</td>\n",
" <td>1456.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Gort, Étienne</td>\n",
" <td>secret</td>\n",
" <td>Bourbon</td>\n",
" <td>1425</td>\n",
" <td>1440.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Erart</td>\n",
" <td>secret</td>\n",
" <td>Berry</td>\n",
" <td>1404</td>\n",
" <td>1405.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Jean de Berry</td>\n",
" <td>prince</td>\n",
" <td>Berry</td>\n",
" <td>1337</td>\n",
" <td>1360.0</td>\n",
" <td>1416.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Agnès de Bourgogne</td>\n",
" <td>prince</td>\n",
" <td>Bourbon</td>\n",
" <td>1407</td>\n",
" <td>1434.0</td>\n",
" <td>1476.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" NAME ROLE HOUSE DATE1 DATE2 DATE3\n",
"0 Charles Ier de Bourbon prince Bourbon 1400 1434.0 1456.0\n",
"1 Gort, Étienne secret Bourbon 1425 1440.0 NaN\n",
"2 Erart secret Berry 1404 1405.0 NaN\n",
"3 Jean de Berry prince Berry 1337 1360.0 1416.0\n",
"4 Agnès de Bourgogne prince Bourbon 1407 1434.0 1476.0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog.load(\"actors\").head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "eedbc7fb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['actors', 'corpus-agnes-bourgogne', 'corpus-charles-i', 'parameters']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog.list()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "3168935f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[06/16/23 14:58:30] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO </span> Loading data from <span style=\"color: #008000; text-decoration-color: #008000\">'actors'</span> <span style=\"font-weight: bold\">(</span>CSVDataSet<span style=\"font-weight: bold\">)</span><span style=\"color: #808000; text-decoration-color: #808000\">...</span> <a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">data_catalog.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">345</span></a>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[2;36m[06/16/23 14:58:30]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Loading data from \u001b[32m'actors'\u001b[0m \u001b[1m(\u001b[0mCSVDataSet\u001b[1m)\u001b[0m\u001b[33m...\u001b[0m \u001b]8;id=659228;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\u001b\\\u001b[2mdata_catalog.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=160900;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\u001b\\\u001b[2m345\u001b[0m\u001b]8;;\u001b\\\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"['NAME', 'ROLE', 'HOUSE', 'DATE1', 'DATE2', 'DATE3']"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"actors = catalog.load(\"actors\")\n",
"actors.columns.tolist()"
]
},
{
"cell_type": "markdown",
"id": "902dd387",
"metadata": {},
"source": [
"## Nettoyage des valeurs non renseignées\n",
"\n",
"Ligne d'origine (ligne 9) : \n",
"`\"René d'Anjou\";\"prince\";\"Anjou\";\"XXXX\";\"XXXX\";\"XXXX\"`\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "24fc62ce",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NAME Bernard d'Armagnac\n",
"ROLE prince\n",
"HOUSE Armagnac\n",
"DATE1 NaN\n",
"DATE2 NaN\n",
"DATE3 NaN\n",
"Name: 9, dtype: object"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#actors.values\n",
"import numpy as np\n",
"cleaned_actors = actors.replace(\"XXXX\", np.NaN)\n",
"actors.head()\n",
"#actors.values\n",
"cleaned_actors.iloc[9]"
]
},
{
"cell_type": "markdown",
"id": "ee287f62",
"metadata": {},
"source": [
"## Autres catalogues"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "053ed17c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['actors',\n",
" 'corpus-agnes-bourgogne',\n",
" 'corpus-charles-i',\n",
" 'dataset_test',\n",
" 'preprocessed_dataset_test',\n",
" 'load_xml',\n",
" 'preprocess_html',\n",
" 'load_full_xml_catalog',\n",
" 'preprocess_full_catalog_html',\n",
" 'preprocessed_actors',\n",
" 'parameters',\n",
" 'params:xlststylesheet']"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog.list()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "660b898c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[06/20/23 16:44:19] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO </span> Loading data from <span style=\"color: #008000; text-decoration-color: #008000\">'load_xml'</span> <span style=\"font-weight: bold\">(</span>XMLDataSet<span style=\"font-weight: bold\">)</span><span style=\"color: #808000; text-decoration-color: #808000\">...</span> <a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">data_catalog.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">345</span></a>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[2;36m[06/20/23 16:44:19]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Loading data from \u001b[32m'load_xml'\u001b[0m \u001b[1m(\u001b[0mXMLDataSet\u001b[1m)\u001b[0m\u001b[33m...\u001b[0m \u001b]8;id=813727;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\u001b\\\u001b[2mdata_catalog.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=696103;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\u001b\\\u001b[2m345\u001b[0m\u001b]8;;\u001b\\\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<lxml.etree._ElementTree at 0x7f3e4c3b99c0>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"catalog.load(\"load_xml\")"
]
},
{
"cell_type": "markdown",
"id": "a46ddef9",
"metadata": {},
"source": [
"## PartitionedDataset catalogs"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "96a60999",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[06/22/23 15:01:39] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO </span> Loading data from <span style=\"color: #008000; text-decoration-color: #008000\">'load_full_xml_catalog'</span> <span style=\"font-weight: bold\">(</span>PartitionedDataSet<span style=\"font-weight: bold\">)</span><span style=\"color: #808000; text-decoration-color: #808000\">...</span> <a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">data_catalog.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">345</span></a>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[2;36m[06/22/23 15:01:39]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Loading data from \u001b[32m'load_full_xml_catalog'\u001b[0m \u001b[1m(\u001b[0mPartitionedDataSet\u001b[1m)\u001b[0m\u001b[33m...\u001b[0m \u001b]8;id=663642;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\u001b\\\u001b[2mdata_catalog.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=709654;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\u001b\\\u001b[2m345\u001b[0m\u001b]8;;\u001b\\\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\"> </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO </span> Loading data from <span style=\"color: #008000; text-decoration-color: #008000\">'load_full_xml_catalog'</span> <span style=\"font-weight: bold\">(</span>PartitionedDataSet<span style=\"font-weight: bold\">)</span><span style=\"color: #808000; text-decoration-color: #808000\">...</span> <a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">data_catalog.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">345</span></a>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[2;36m \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Loading data from \u001b[32m'load_full_xml_catalog'\u001b[0m \u001b[1m(\u001b[0mPartitionedDataSet\u001b[1m)\u001b[0m\u001b[33m...\u001b[0m \u001b]8;id=916916;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py\u001b\\\u001b[2mdata_catalog.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=129179;file:///media/gwen/maxtor/gwen/entrepot/cnrs/nicolas/depot/datascience/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#345\u001b\\\u001b[2m345\u001b[0m\u001b]8;;\u001b\\\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"{'anj_is_i_1441_08_05a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7e16df0>>,\n",
" 'anj_lo_i_1360_08a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9700>>,\n",
" 'anj_lo_i_1371_07_08a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd96a0>>,\n",
" 'anj_lo_ii_1401_04_28a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9430>>,\n",
" 'anj_lo_ii_1402_11_07a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd92b0>>,\n",
" 'anj_lo_ii_1405_05_02a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9340>>,\n",
" 'anj_lo_ii_1406_01_26a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd93d0>>,\n",
" 'anj_lo_ii_1406_04_15a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd94c0>>,\n",
" 'anj_lo_ii_1409_08_07a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd94f0>>,\n",
" 'anj_lo_ii_1409_12_12a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9520>>,\n",
" 'anj_lo_ii_1413_03_01a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9610>>,\n",
" 'anj_lo_iii_1420_11_04a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9580>>,\n",
" 'anj_lo_iii_1422_02_09a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd95b0>>,\n",
" 'anj_lo_iii_1424_03_31a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9d00>>,\n",
" 'anj_lo_iii_1424_03_31b': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9c40>>,\n",
" 'anj_lo_iii_1428_06_07a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9970>>,\n",
" 'anj_lo_iii_1428_06_07b': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9940>>,\n",
" 'anj_lo_iii_1432_10_27a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9880>>,\n",
" 'anj_ma_i_1370_12_10a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9a90>>,\n",
" 'anj_re_i_1437_09_16a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9a30>>,\n",
" 'anj_re_i_1439_11_22a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9af0>>,\n",
" 'anj_re_i_1440_01_20a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9730>>,\n",
" 'anj_re_i_1445a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9b20>>,\n",
" 'anj_re_i_1450_11_07a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd99d0>>,\n",
" 'anj_re_i_1454_01_14a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd99a0>>,\n",
" 'anj_re_i_1454_02_09a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd97f0>>,\n",
" 'anj_re_i_1454_06_17a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9d30>>,\n",
" 'anj_re_i_1454_09_01a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9d90>>,\n",
" 'anj_re_i_1455_11_13a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9df0>>,\n",
" 'anj_re_i_1456_11_29a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9e50>>,\n",
" 'anj_re_i_1457_01_04a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9eb0>>,\n",
" 'anj_re_i_1459_03_17a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9f10>>,\n",
" 'anj_re_i_1459_04_16a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9f70>>,\n",
" 'anj_re_i_1463_07_21a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7dd9fd0>>,\n",
" 'anj_re_i_1466_12_16a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7ddb070>>,\n",
" 'anj_re_i_1474_02_01a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7ddb0d0>>,\n",
" 'anj_re_i_1475_05_26a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7ddb130>>,\n",
" 'anj_yo_i_1418_12_20a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7ddb190>>,\n",
" 'anj_yo_i_1421_06_28a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7ddb1f0>>,\n",
" 'anj_yo_i_1442_02_24a': <bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7fa3f7ddb250>>}"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"partitions = catalog.load('load_full_xml_catalog')\n",
"catalog.load('load_full_xml_catalog')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bdc37079",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<bound method AbstractDataSet.load of <actesdataset.XMLDataSet object at 0x7faad403c550>>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"partitions['anj_is_i_1441_08_05a']"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Kedro (actes_princiers)",
"language": "python",
"name": "kedro_actes_princiers"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -1,17 +0,0 @@
[tool.kedro]
package_name = "actes_princiers"
project_name = "Actes Princiers"
kedro_init_version = "0.18.10"
[tool.isort]
profile = "black"
[tool.pytest.ini_options]
addopts = """
--cov-report term-missing \
--cov src/actes_princiers -ra"""
[tool.coverage.report]
fail_under = 0
show_missing = true
exclude_lines = ["pragma: no cover", "raise NotImplementedError"]

@ -1,3 +0,0 @@
[flake8]
max-line-length=88
extend-ignore=E203

@ -1,4 +0,0 @@
"""Actes Princiers
"""
__version__ = "0.1"

@ -1,47 +0,0 @@
"""Actes Princiers file for ensuring the package is executable
as `actes-princiers` and `python -m actes_princiers`
"""
import importlib
from pathlib import Path
from kedro.framework.cli.utils import KedroCliError, load_entry_points
from kedro.framework.project import configure_project
def _find_run_command(package_name):
try:
project_cli = importlib.import_module(f"{package_name}.cli")
# fail gracefully if cli.py does not exist
except ModuleNotFoundError as exc:
if f"{package_name}.cli" not in str(exc):
raise
plugins = load_entry_points("project")
run = _find_run_command_in_plugins(plugins) if plugins else None
if run:
# use run command from installed plugin if it exists
return run
# use run command from `kedro.framework.cli.project`
from kedro.framework.cli.project import run
return run
# fail badly if cli.py exists, but has no `cli` in it
if not hasattr(project_cli, "cli"):
raise KedroCliError(f"Cannot load commands from {package_name}.cli")
return project_cli.run
def _find_run_command_in_plugins(plugins):
for group in plugins:
if "run" in group.commands:
return group.commands["run"]
def main(*args, **kwargs):
package_name = Path(__file__).parent.name
configure_project(package_name)
run = _find_run_command(package_name)
run(*args, **kwargs)
if __name__ == "__main__":
main()

@ -1,38 +0,0 @@
#from typing import Dict
from kedro.framework.context import KedroContext
class ProjectContext(KedroContext):
project_name = "actes princiers"
project_version = "0.1"
package_name = "actes_princiers"
# def get_params(self):
# houses = self.config_loader.get("params*")
# return params
# def get_houses(self):
# """loading from generic configuration file
# (that is, the global houses `houses.yaml`)"""
# houses = self.config_loader.get("houses*")
# return houses['houses']
# def get_houses_datapath(self):
# """loading from generic configuration file"""
# houses = self.config_loader.get("houses*")
# return houses['raw_datapath']
# def get_catalog(self):
# "catalog loader entry point"
# # loading yaml defined catalogs
# catalog = self.config_loader.get('catalog*')
# return catalog
# def _get_catalog(self, *args, **kwargs):
# "catalog loader entry point"
# # loading yaml defined catalogs
# catalog = super()._get_catalog(*args, **kwargs)
# # kedro.io.data_catalog.DataCatalog
# # adding data sets
# self.nodes_description = self._house_dataset_loader(catalog)
# return catalog

@ -1,16 +0,0 @@
"""Project pipelines."""
from __future__ import annotations
from kedro.framework.project import find_pipelines
from kedro.pipeline import Pipeline
def register_pipelines() -> dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from pipeline names to ``Pipeline`` objects.
"""
pipelines = find_pipelines()
pipelines["__default__"] = sum(pipelines.values())
return pipelines

@ -1,3 +0,0 @@
"Data Processing pipeline"
from .pipeline import create_pipeline # NOQA

@ -1,130 +0,0 @@
import logging
import urllib.parse
from pathlib import Path
from typing import Dict
from mongoengine import connect
from mongoengine import Document, StringField, DictField, ListField
#import folium
from kedro.framework.session import KedroSession
from kedro.extras.datasets.yaml import YAMLDataSet
from kedro.extras.datasets.json import JSONDataSet
from actesdataset import JSONDataSetCollection
logger = logging.getLogger(__name__)
#class FoliumMap(Document):
# globalmap = StringField(required=True)
class Helpers(Document):
house_trigram = DictField()
prince_bigram = DictField()
# Database schemas
class House(Document):
"_id is the name"
_id = StringField(required=True, max_length=100)
name = StringField(required=True, max_length=100)
trigram = StringField(required=True, max_length=3)
particle = StringField(required=True, max_length=150)
class Acte(Document):
"""_id is the filename"""
_id = StringField(required=True, max_length=150)
house = StringField(required=True, max_length=100)
prince_name = StringField(required=True, max_length=150)
prince_code = StringField(required=True, max_length=100)
analysis = StringField(required=True, max_length=3000)
date = StringField(required=True, max_length=250)
transcribers = ListField(required=True)
# FIXME date_teim type shal it be a **real** date object ?
date_time = StringField(required=True, max_length=15)
filename = StringField(required=True, max_length=100)
ref_acte = StringField(required=True, max_length=100)
xmlcontent = StringField(required=True) # no max_length
place = DictField()
folium = StringField(required=False) # no max_length
def db_connect(storage_ip, db_name, mongodb_admin, mongodb_password):
#mongodb://%s:%s@149.202.41.75:27017' % (username, password)
username = urllib.parse.quote_plus(mongodb_admin)
password = urllib.parse.quote_plus(mongodb_password)
mongodb_url = f"mongodb://{username}:{password}@{storage_ip}:27017/"
#mongodb_url = "mongodb://{}:27017/".format(storage_ip)
#logger.info("connection to the mongodb server")
# pymongo settings
# myclient = pymongo.MongoClient(mongodb_url)
myclient = connect(db=db_name, host=mongodb_url, authentication_source='admin', alias="default")
# pipeline functions
def populate_mongo(jsondoc: JSONDataSetCollection, storage_ip: str, db_name: str, db_collection_name: str, mongodb_admin: str, mongodb_password: str) -> None:
"loads the json for an acte"
jsondatasets = jsondoc.datasets
housename = jsondoc._housename
db_connect(storage_ip, db_name, mongodb_admin, mongodb_password)
#places = []
for dataset_filenamestem, dataset in jsondatasets.items():
# a manual load is required here, because
# the dataset **is not** registered in kedro's catalog
json_document = dataset._load()
json_document["_id"] = json_document["filename"]
acte_entry = Acte(**json_document)
logger.info("... adding entry: " + json_document["filename"])
acte_entry.save()
## place location in the folium global map
#if json_document['place'].get('latitude') is not None:
# places.append(json_document['place'])
# folium global map
# add to mongo db
#m = folium.Map(location=[46.603354, 1.888334], zoom_start=6)
#folium.TileLayer(name="Géolocalisation des actes",control=False).add_to(m)
#for place in places:
# folium.Marker(
# location=[float(place['latitude']), float(place['longitude'])],
# popup=place['name']
# #icon=folium.Icon(icon="cloud")
# #icon=folium.Icon(color='lightgray', icon='home', prefix='fa')
# ).add_to(m)
#globalmap = m.get_root()._repr_html_()
##globalmap = m._repr_html_()
#folium_map = FoliumMap(globalmap=globalmap)
#folium_map.save()
return
def load_houses(yamldoc: YAMLDataSet, storage_ip: str, db_name: str, mongodb_admin: str, mongodb_password: str) -> None:
db_connect(storage_ip, db_name, mongodb_admin, mongodb_password)
for house_dict in yamldoc['houses'].values():
house_dict['_id'] = house_dict['name']
#houses_col.insert_one(house_dict)
house_entry = House(**house_dict)
house_entry.save()
return
def load_helpers(house_trigram: JSONDataSet, prince_bigram: JSONDataSet,
storage_ip: str, db_name: str, mongodb_admin: str, mongodb_password: str) -> None:
db_connect(storage_ip, db_name, mongodb_admin, mongodb_password)
helper_entry = Helpers(house_trigram=house_trigram, prince_bigram=prince_bigram)
helper_entry.save()

@ -1,55 +0,0 @@
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import populate_mongo, load_houses, load_helpers
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=populate_mongo,
inputs=["bourbon_fulljsonoutput", "params:storage_ip", "params:db_name",
"params:db_collection_name", "params:mongodb_admin",
"params:mongodb_password"],
outputs=None,
name="populate_actes",
tags="populate_database",
),
node(
func=populate_mongo,
inputs=["berry_fulljsonoutput", "params:storage_ip", "params:db_name",
"params:db_collection_name", "params:mongodb_admin",
"params:mongodb_password"],
outputs=None,
name="populate_actes",
tags="populate_database",
),
node(
func=populate_mongo,
inputs=["anjou_fulljsonoutput", "params:storage_ip", "params:db_name",
"params:db_collection_name", "params:mongodb_admin",
"params:mongodb_password"],
outputs=None,
name="populate_actes",
tags="populate_database",
),
node(
func=load_houses,
inputs=["houses", "params:storage_ip", "params:db_name",
"params:mongodb_admin", "params:mongodb_password"],
outputs=None,
name="polulate_houses",
tags="populate_database",
),
node(
func=load_helpers,
inputs=["houses_trigram","prince_bigram",
"params:storage_ip", "params:db_name",
"params:mongodb_admin", "params:mongodb_password"],
outputs=None,
name="populate_helpers",
tags="populate_database",
)
]
)

@ -1,3 +0,0 @@
"Data Processing pipeline"
from .pipeline import create_pipeline # NOQA

@ -1,110 +0,0 @@
import logging
from pathlib import Path
from typing import Dict
import folium
from kedro.framework.session import KedroSession
from actesdataset import EtreeXMLDataSet, BsXMLDataSet, JSONDataSet
from actesdataset import (XMLDataSetCollection, BsXMLDataSetCollection,
JSONDataSetCollection, TextDataSetCollection)
logger = logging.getLogger(__name__)
with KedroSession.create() as session:
context = session.load_context()
# catalog = context.get_catalog()
def parse_xml_collection(datasetcol: XMLDataSetCollection) -> XMLDataSetCollection:
"node function entry point, performs batch processing"
datasets = datasetcol.datasets
housename = datasetcol._housename
output_datasets = context.catalog.load(housename + '_xmlcontent')
outputfolderpath = output_datasets._folderpath
for dataset_filenamestem, dataset in datasets.items():
# a manual load is required here, because
# the dataset **is not** registered in kedro's catalog
dataset._load()
output_source_doc = dataset.transform()
# set dataset's output filepath
output_filepath = outputfolderpath / Path(dataset_filenamestem).with_suffix(".pseudoxml")
output_xmldataset = EtreeXMLDataSet(str(output_filepath), output_datasets.xsltstylesheet)
# let's create subfolders, if they don't exist
output_xmldataset_dir = output_filepath.parent
output_xmldataset_dir.mkdir(parents=True, exist_ok=True)
# save on file
output_xmldataset._save(output_source_doc)
output_datasets.datasets[dataset_filenamestem] = output_xmldataset
return output_datasets
def make_json_collection(datasetcol: BsXMLDataSetCollection) -> JSONDataSetCollection:
"node function entry point, performs batch processing"
datasets = datasetcol.datasets
housename = datasetcol._housename
output_datasets = context.catalog.load(housename + '_jsonoutput')
outputfolderpath = output_datasets._folderpath
for dataset_filenamestem, dataset in datasets.items():
#logger.info("filestem:" + dataset_filenamestem)
# a manual load is required here, because
# the dataset **is not** registered in kedro's catalog
dataset._load()
output_source_doc = dataset.transform()
# let's add the house into the JSONDataSet, could be usefull
output_source_doc['house'] = housename
# set dataset's output filepath
output_filepath = outputfolderpath / Path(dataset_filenamestem).with_suffix(".json")
output_xmldataset = JSONDataSet(str(output_filepath))
# let's create subfolders, if they don't exist
output_xmldataset_dir = output_filepath.parent
output_xmldataset_dir.mkdir(parents=True, exist_ok=True)
# save on file
output_xmldataset._save(output_source_doc)
output_datasets.datasets[dataset_filenamestem] = output_xmldataset
return output_datasets
def _make_map(latitude, longitude, popup):
m = folium.Map(location=[latitude, longitude], zoom_start=12,
width=800,
height=600,
)
folium.Marker(
location=[latitude, longitude],
popup=popup,
icon=folium.Icon(color='lightgray', icon="circle", prefix='fa')
).add_to(m)
return m.get_root()._repr_html_()
def add_xmlcontent_tojson(jsondoc: JSONDataSetCollection, xmlcontent: TextDataSetCollection) -> JSONDataSetCollection:
"adds xmlcontent to the json"
jsondatasets = jsondoc.datasets
housename = jsondoc._housename
output_datasets = context.catalog.load(housename + '_fulljsonoutput')
outputfolderpath = output_datasets._folderpath
xmldatasets = xmlcontent.datasets
for dataset_filenamestem, dataset in jsondatasets.items():
document = dataset._load()
output_filepath = outputfolderpath / Path(dataset_filenamestem).with_suffix(".json")
output_xmldataset = JSONDataSet(str(output_filepath))
# json dict update with xmlcontent
if dataset_filenamestem in xmldatasets:
xmlds = xmldatasets[dataset_filenamestem]
# xmlds._load()
document['xmlcontent'] = xmldatasets[dataset_filenamestem]._load()
if document['place']['latitude'] is not None:
document['folium'] = _make_map(document['place']['latitude'],
document['place']['longitude'],
document['place']['name'])
else:
document['folium'] = None
else:
raise KeyError(f"xmlcontent datasets does not have the key : {dataset_filenamestem}")
# let's create subfolders, if they don't exist
output_xmldataset_dir = output_filepath.parent
output_xmldataset_dir.mkdir(parents=True, exist_ok=True)
# save on file
output_xmldataset._save(document)
output_datasets.datasets[dataset_filenamestem] = output_xmldataset
return output_datasets

@ -1,80 +0,0 @@
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import (parse_xml_collection, make_json_collection,
add_xmlcontent_tojson)
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
# bourbon
node(
func=parse_xml_collection,
inputs=["bourbon"],
outputs="bourbon_xmlcontent",
name="bourbon_ds_collection",
tags="etl_transform"
),
node(
func=make_json_collection,
inputs="bourbon_json",
outputs="bourbon_jsonoutput",
name="bourbon_json_ds_collection",
tags="etl_transform"
),
node(
func=add_xmlcontent_tojson,
inputs=["bourbon_jsonoutput", "bourbon_pseudoxmlcontent"],
outputs="bourbon_fulljsonoutput",
name="bourbon_fulljson_ds_collection",
tags="etl_transform"
),
# berry
node(
func=parse_xml_collection,
inputs=["berry"],
outputs="berry_xmlcontent",
name="berry_ds_collection",
tags="etl_transform"
),
node(
func=make_json_collection,
inputs="berry_json",
outputs="berry_jsonoutput",
name="berry_json_ds_collection",
tags="etl_transform"
),
node(
func=add_xmlcontent_tojson,
inputs=["berry_jsonoutput", "berry_pseudoxmlcontent"],
outputs="berry_fulljsonoutput",
name="berry_fulljson_ds_collection",
tags="etl_transform"
),
# anjou
node(
func=parse_xml_collection,
inputs=["anjou"],
outputs="anjou_xmlcontent",
name="anjou_ds_collection",
tags="etl_transform"
),
node(
func=make_json_collection,
inputs="anjou_json",
outputs="anjou_jsonoutput",
name="anjou_json_ds_collection",
tags="etl_transform"
),
node(
func=add_xmlcontent_tojson,
inputs=["anjou_jsonoutput", "anjou_pseudoxmlcontent"],
outputs="anjou_fulljsonoutput",
name="anjou_fulljson_ds_collection",
tags="etl_transform"
),
]
)

@ -1,42 +0,0 @@
"""Project settings. There is no need to edit this file unless you want to change values
from the Kedro defaults. For further information, including these default values, see
https://kedro.readthedocs.io/en/stable/kedro_project_setup/settings.html."""
# Instantiated project hooks.
# For example, after creating a hooks.py and defining a ProjectHooks class there, do
# from actes_princiers.hooks import ProjectHooks
# HOOKS = (ProjectHooks(),)
# Installed plugins for which to disable hook auto-registration.
# DISABLE_HOOKS_FOR_PLUGINS = ("kedro-viz",)
# Class that manages storing KedroSession data.
# from kedro.framework.session.store import BaseSessionStore
# SESSION_STORE_CLASS = BaseSessionStore
# Keyword arguments to pass to the `SESSION_STORE_CLASS` constructor.
# SESSION_STORE_ARGS = {
# "path": "./sessions"
# }
# Directory that holds configuration.
# CONF_SOURCE = "conf"
# Class that manages how configuration is loaded.
from kedro.config import OmegaConfigLoader
CONFIG_LOADER_CLASS = OmegaConfigLoader
# Keyword arguments to pass to the `CONFIG_LOADER_CLASS` constructor.
# CONFIG_LOADER_ARGS = {
# "config_patterns": {
# "spark" : ["spark*/"],
# "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
# }
# }
# Class that manages Kedro's library components.
# from kedro.framework.context import KedroContext
#from .customcontext import ProjectContext
#CONTEXT_CLASS = ProjectContext
# Class that manages the Data Catalog.
# from kedro.io import DataCatalog
# DATA_CATALOG_CLASS = DataCatalog

@ -1,315 +0,0 @@
import logging
import json
from typing import Dict, Any
from pathlib import Path
from abc import ABC, abstractmethod
from lxml import etree
from bs4 import BeautifulSoup
from kedro.io import AbstractDataSet, DataSetError
from kedro.framework.session import KedroSession
logger = logging.getLogger(__name__)
class XMLDataSet(ABC):
"Abstract base class for an XML dataset loader"
def __init__(self, filepath: str) -> None:
self._filepath = filepath
@property
def filepath(self) -> str:
"xml file's filename getters"
return self._filepath
def _describe(self) -> Dict[str, Any]:
"kedro's API-like repr()"
return dict(filepath=self._filepath)
@abstractmethod
def _load(self):
pass
def _save(self, data:str) -> None:
pass
class EtreeXMLDataSet(XMLDataSet):
"XMLDataSet loader with lxml.etree (lxml.etree._ElementTree)"
def __init__(self, filepath, params):
self._filepath = filepath
self.xsltstylesheet = params
def _load(self):
"from the xml file loads a internal xml repr (with element tree)"
# self.source_doc is an etree internal xml repr document
self.source_doc = etree.parse(self._filepath)
# removing namespace
query = "descendant-or-self::*[namespace-uri()!='']"
for element in self.source_doc.xpath(query):
#replacing element name with its local name
element.tag = etree.QName(element).localname
etree.cleanup_namespaces(self.source_doc)
def _save(self, data:str) -> None:
"kedro's API-like saver"
with open(self._filepath, 'w') as fhandle:
fhandle.write(data)
@staticmethod
def _xslt(xsltstylesheet):
"performs XML transformation on each dataset"
xslt_doc = etree.parse(xsltstylesheet)
xslt_transformer = etree.XSLT(xslt_doc)
return xslt_transformer
def transform(self):
xslt_transformer = self._xslt(self.xsltstylesheet)
return str(xslt_transformer(self.source_doc))
class BsXMLDataSet(XMLDataSet):
"XMLDataSet loader with BeautifulSoup"
def _load(self):
"from the xml file, loads a internal xml repr (with bsoup)"
with open(self._filepath, 'r', encoding="utf-8") as fhandle:
self.soup = BeautifulSoup(fhandle, 'xml')
## xml.prettify() is the bsoup str(source_doc)
def _save(self, data: Dict) -> None:
"kedro's API-like saver"
with open(self._filepath, 'w') as fp:
json.dump(data, fp, sort_keys=True, indent=4)
def find_transcribers(self):
"find transcriber xml bs4 helper"
transcribers = self.soup.find_all('respStmt')
trs = []
for pers in transcribers:
trs_name = pers.find('name')
if trs_name:
trs.append(trs_name.get_text())
return trs
def find_prince_name(self):
"""find prince_name xml bs4 helper
prince_name = tree.xpath('//listPerson[@type="prince"]/person/name/text()')
"""
person = self.soup.find("listPerson", {'type': "prince"} )
ps = person.find('name')
prince_name = ps.get_text()
return prince_name
def make_prince_code_from_filestem(self, filestem):
"""
builds prince code
:param: filestem
sample: "anj_isa_i_1441_08_05a"
:return: prince code, sample: "isa_i"
"""
# cut with the underscores
cut = filestem.split('_')
# remove house and date
prince_code = "_".join(cut[1:3])
return prince_code
def transform(self):
#soup = make_soup(os.path.join(folder, acte))
# 1.1/ Get all data from XML (9). counter is the id (= numb_acte)
numb = self.soup.TEI["xml:id"] # /TEI[@xml:id] is always the acte's ID
date_time = self.soup.msItem.docDate["when"] # YYYY-MM-DD or YYYY-MM date
date = self.soup.msItem.docDate.text # verbose date
analyse = self.soup.abstract.p.text # acte's short analysis
ref = self.soup.msIdentifier.find_all("idno", {"n": "2"})
if len(ref) > 0: # there is an analysis
ref_acte = ref[0].text
else: # there is no analysis
ref_acte = "NS"
# //sourceDesc//msIdentifier/idno[@n='2'] is the doc id inside the
# archive box or the page number inside a manuscript (see _create_doc)
# warning: the analysis may not have been written yet,
# which would result in List Index Out of Range Error. Hence:
# //sourceDesc//msIdentifier/idno[@n='1'] is always the
# archive box or manuscript collection id
#doc = self.soup.msIdentifier.find_all("idno", {"n": "1"})[0]
#type_diplo = self.soup.body.div["subtype"]
#diplo_state = self.soup.body.div["type"]
# geolocalisation
place = self.soup.find("place")
place_name = place.find("placeName")
if place_name.get_text() != "NS":
pl_name = place_name.get_text()
else:
pl_name = "Non spécifié"
region_balise = place.find("region")
if region_balise is not None:
region = region_balise.get_text()
else:
region = "Non spécifié"
settlement = place.find("settlement")
if settlement is not None:
settlement = settlement.get_text()
else:
settlement = "Non spécifié"
geolocalisation = place.find("geo")
if geolocalisation is not None:
geolocalisation = geolocalisation.get_text()
latitude, longitude = geolocalisation.split(" ")
else:
latitude = None
longitude = None
place = dict(name=pl_name,
region=region,
settlement=settlement,
latitude = latitude,
longitude = longitude
)
return {
# "num_acte": counter,
"prince_name": self.find_prince_name(),
"prince_code": self.make_prince_code_from_filestem(numb),
"filename": numb,
"date_time": date_time,
"date": date,
# "prod_place_acte": place_query[0],
"analysis": analyse,
# "doc_acte": doc_query[0],
"ref_acte": ref_acte,
"transcribers": self.find_transcribers(),
"place": place
# "state_doc": state_query[0],
# "diplo_type_acte": diplo_query[0]
}
class DataSetCollection(AbstractDataSet):
"""Stores instances of ``DataSetCollection``
implementations to provide ``_load`` and ``_save`` capabilities.
"""
def __init__(self,
housename: str,
folderpath: str) -> None:
self._housename = housename
self._folderpath = Path(folderpath)
# the collections key: file name, value: dataset object
self.datasets = dict()
def _save(self, data) -> None:
"""kedro's API saver method
 There is **nothing to save**, because
 this dataset collections is a *container* dataset.
this method is here only because kedro requires it.
 """
pass
def _describe(self) -> dict[str, Any]:
"kedro's API repr()"
return dict(name=self._housename,
folderpath=str(self._folderpath))
class XMLDataSetCollection(DataSetCollection):
def __init__(self, housename: str,
folderpath: str, xsltstylesheet: str) -> None:
super().__init__(housename, folderpath)
self.xsltstylesheet = xsltstylesheet
def _load(self) -> dict[str, EtreeXMLDataSet]:
"kedro's API loader method"
for filepath in sorted(self._folderpath.glob("*.xml")):
self.datasets[filepath.stem] = EtreeXMLDataSet(str(filepath), self.xsltstylesheet)
return self
class BsXMLDataSetCollection(DataSetCollection):
def _load(self) -> dict[str, BsXMLDataSet]:
"kedro's API loader method"
self.datasets = dict()
for filepath in sorted(self._folderpath.glob("*.xml")):
self.datasets[filepath.stem] = BsXMLDataSet(
filepath=str(filepath))
return self
class JSONDataSet:
def __init__(self, filepath: str):
self._filepath = filepath
def _load(self) -> Dict:
with open(self._filepath, 'r') as fp:
return json.load(fp)
def _save(self, data: Dict) -> None:
with open(self._filepath, 'w') as fp:
json.dump(data, fp, sort_keys=True, indent=4)
def _describe(self) -> Dict[str, Any]:
return dict(filepath=self._filepath)
class JSONDataSetCollection(DataSetCollection):
def _load(self) -> dict[str, JSONDataSet]:
"kedro's API loader method"
self.datasets = dict()
for filepath in sorted(self._folderpath.glob("*.json")):
self.datasets[filepath.stem] = JSONDataSet(
filepath=str(filepath))
return self
class TextDataSet:
"""loads/saves data from/to a text file using an underlying filesystem
example usage
>>> string_to_write = "This will go in a file."
>>>
>>> data_set = TextDataSet(filepath="test.md")
>>> data_set.save(string_to_write)
>>> reloaded = data_set.load()
>>> assert string_to_write == reloaded
"""
def __init__(self, filepath: str):
self._filepath = filepath
def _load(self) -> str:
with open(self._filepath, 'r') as fhandle:
return fhandle.read()
def _save(self, data: str) -> None:
with open(self._filepath, 'w') as fhandle:
fhandle.write(data)
def _describe(self) -> Dict[str, Any]:
return dict(filepath=self._filepath)
class TextDataSetCollection(DataSetCollection):
def _load(self) -> dict[str, JSONDataSet]:
"kedro's API loader method"
self.datasets = dict()
for filepath in sorted(self._folderpath.glob("*.pseudoxml")):
self.datasets[filepath.stem] = TextDataSet(
filepath=str(filepath))
return self
#class FoliumHTMLDataSet(AbstractDataSet):
# def __init__(self, filepath: str):
# self._filepath = filepath
#
# def _load(self) -> None:
# raise DataSetError('This dataset is WriteOnly')
#
# def _describe(self) -> Dict[str, Any]:
# return dict(filepath=self._filepath)
#
# def _save(self, data: Map) -> None:
# data.save(self._filepath)

@ -1,13 +0,0 @@
beautifulsoup4==4.12.2
python-slugify>=8.0.1
ipython>=7.31.1, <8.0; python_version < '3.8'
ipython~=8.10; python_version >= '3.8'
isort~=5.0
kedro~=0.18.12
kedro-datasets~=1.7.0
kedro-telemetry~=0.2.5
lxml~=4.9.3
nbstripout~=0.4
pymongo~=4.5.0
mongoengine~=0.27.0
folium~=0.14.0

@ -1,39 +0,0 @@
from setuptools import find_packages, setup
entry_point = (
"actes-princiers = actes_princiers.__main__:main"
)
# get the dependencies and installs
with open("requirements.txt", encoding="utf-8") as f:
# Make sure we strip all comments and options (e.g "--extra-index-url")
# that arise from a modified pip.conf file that configure global options
# when running kedro build-reqs
requires = []
for line in f:
req = line.split("#", 1)[0].strip()
if req and not req.startswith("--"):
requires.append(req)
setup(
name="actes_princiers",
version="0.1",
packages=find_packages(exclude=["tests"]),
entry_points={"console_scripts": [entry_point]},
install_requires=requires,
extras_require={
"docs": [
"docutils<0.18.0",
"sphinx~=3.4.3",
"sphinx_rtd_theme==0.5.1",
"nbsphinx==0.8.1",
"nbstripout~=0.4",
"sphinx-autodoc-typehints==1.11.1",
"sphinx_copybutton==0.3.1",
"ipykernel>=5.3, <7.0",
"Jinja2<3.1.0",
"myst-parser~=0.17.2",
]
},
)

@ -1,41 +0,0 @@
"""
This module contains an example test.
Tests should be placed in ``src/tests``, in modules that mirror your
project's structure, and in files named test_*.py. They are simply functions
named ``test_*`` which test a unit of logic.
To run the tests, run ``kedro test`` from the project root directory.
"""
from pathlib import Path
import pytest
from kedro.framework.project import settings
from kedro.config import ConfigLoader
from kedro.framework.context import KedroContext
from kedro.framework.hooks import _create_hook_manager
@pytest.fixture
def config_loader():
return ConfigLoader(conf_source=str(Path.cwd() / settings.CONF_SOURCE))
@pytest.fixture
def project_context(config_loader):
return KedroContext(
package_name="actes_princiers",
project_path=Path.cwd(),
config_loader=config_loader,
hook_manager=_create_hook_manager(),
)
# The tests below are here for the demonstration purpose
# and should be replaced with the ones testing the project
# functionality
class TestProjectContext:
def test_project_path(self, project_context):
assert project_context.project_path == Path.cwd()

@ -1,483 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="yes"/>
<!-- param pour l'id de l'acte -->
<xsl:param name="numero"/>
<!-- STRUCTURATION GLOBALE -->
<xsl:template match="/" >
<!-- bloc du paratexte et du texte
<xsl:apply-templates select="//sourceDesc/listWit"/>-->
<xsl:apply-templates select="//text/body/div"/>
<!-- bloc des notes -->
<div>
<!-- notes critiques -->
<div class="note-global">
<xsl:apply-templates select="//note[@type='n1']/p"/>
</div>
</div>
<!-- notes paléographiques -->
<div class="footnote">
<ol>
<xsl:apply-templates select="//text/body/div//note[@type='na']/p"/>
</ol>
</div>
</xsl:template>
<!-- RÈGLES GLOBALES DE MISE EN FORME -->
<xsl:template match="hi[@rend='sup']">
<!-- exposant -->
<xsl:element name="sup">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="hi[@rend='i']">
<!-- italique -->
<xsl:element name="em">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="hi[@rend='smallcaps']">
<!-- petites majuscules -->
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="ref">
<!-- liens -->
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="@target"/>
</xsl:attribute>
<xsl:attribute name="target">
<xsl:text>_blank</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="head">
<!-- titre des actes -->
<xsl:element name="p">
<xsl:attribute name="class">
<xsl:text>text_etabli</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="biblStruct">
<!-- référence bibliographique structurée -->
<xsl:if test="parent::witness/@n='a'">
<!-- si c'est l'édition a. -->
<xsl:value-of select="parent::witness/@n"/>
<xsl:text>. </xsl:text>
</xsl:if>
<xsl:if test="parent::witness/@n='b'">
<!-- si c'est l'édition b. -->
<xsl:value-of select="parent::witness/@n"/>
<xsl:text>. </xsl:text>
</xsl:if>
<xsl:for-each select=".//author/persName">
<!-- identité du ou des auteurs -->
<xsl:if test="./addName">
<xsl:apply-templates select="./addName"/>
<xsl:text> </xsl:text>
</xsl:if>
<xsl:apply-templates select="./forename"/>
<xsl:if test="./surname">
<xsl:text> </xsl:text>
<xsl:apply-templates select="./surname"/>
</xsl:if>
<xsl:text>, </xsl:text>
</xsl:for-each>
<xsl:choose>
<!-- titre -->
<xsl:when test=".//monogr/title[@level='a']">
<!-- 1/ s'il s'agit d'un article -->
<xsl:text>&#171; </xsl:text>
<!-- titre de l'article entre guillemets français -->
<xsl:apply-templates select=".//monogr/title[@level='a']"/>
<xsl:text> &#187;, dans </xsl:text>
<!-- titre de la revue en italique -->
<xsl:element name="em">
<xsl:apply-templates select=".//monogr/title[@level='j']"/>
</xsl:element>
<xsl:if test=".//biblScope[@unit='part']">
<!-- si la revue est organisée en séries (@part) -->
<xsl:text>, </xsl:text>
<xsl:value-of select=".//biblScope[@unit='part']/@n"/>
</xsl:if>
<!-- s'il y a un numéro de la revue -->
<xsl:if test=".//biblScope[@unit='issue']">
<xsl:text>, n°</xsl:text>
<xsl:choose>
<!-- il s'agit d'un numéro unique -->
<xsl:when test=".//biblScope[@unit='issue']/@n">
<xsl:value-of select=".//biblScope[@unit='issue']/@n"/>
</xsl:when>
<xsl:otherwise>
<!-- le numéro n'est pas unique -->
<xsl:value-of select=".//biblScope[@unit='issue']/@from"/>
<xsl:text>-</xsl:text>
<xsl:value-of select=".//biblScope[@unit='issue']/@to"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:when>
<xsl:otherwise>
<!-- 2/ il ne s'agit pas d'un article : c'est un titre de monographie -->
<xsl:element name="em">
<xsl:apply-templates select=".//monogr/title[@level='m']"/>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
<xsl:text>, </xsl:text>
<xsl:if test=".//imprint/biblScope[@unit='volume']">
<xsl:value-of select=".//imprint/biblScope[@unit='volume']/@n"/>
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:if test=".//respStmt/persName">
<!-- éditeur -->
<xsl:for-each select=".//respStmt/persName">
<xsl:apply-templates select="./forename"/>
<xsl:text> </xsl:text>
<xsl:apply-templates select="./surname"/>
<xsl:if test="position()!= last()">, </xsl:if>
</xsl:for-each>
<xsl:text> (</xsl:text>
<xsl:apply-templates select=".//imprint/respStmt/resp"/>
<xsl:text>), </xsl:text>
</xsl:if>
<xsl:if test=".//pubPlace">
<!-- lieu de publication -->
<xsl:apply-templates select=".//pubPlace"/>
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:if test=".//publisher">
<!-- éditeur -->
<xsl:apply-templates select=".//publisher"/>
<xsl:text>, </xsl:text>
</xsl:if>
<!-- date -->
<xsl:value-of select=".//date/@when"/>
<xsl:text>, </xsl:text>
<xsl:if test=".//biblScope[@unit='page']">
<!-- pagination -->
<xsl:choose>
<xsl:when test=".//biblScope[@unit='page']/@n">
<!-- il y a une page -->
<xsl:text>p. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@n"/>
</xsl:when>
<xsl:otherwise>
<!-- il y a plusieurs pages -->
<xsl:text>pp. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@from"/>
<xsl:text>-</xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@to"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
<xsl:if test=".//biblScope[@unit='entry']">
<!-- la localisation dans la page est précisée (numéro d'index, preuve, appendice et al.) -->
<xsl:text> , n°</xsl:text>
<xsl:value-of select=".//biblScope[@unit='entry']/@n"/>
<xsl:if test="./text()">
<xsl:text> </xsl:text>
<xsl:apply-templates select=".//biblScope[@unit='entry']"/>
</xsl:if>
</xsl:if>
<xsl:if test=".//ref">
<!-- il y a un lien vers une numérisation ou autre -->
<xsl:text> </xsl:text>
<xsl:apply-templates select=".//ref"/>
</xsl:if>
<xsl:if test="parent::witness">
<xsl:text>.</xsl:text>
</xsl:if>
</xsl:template>
<xsl:template match="bibl">
<!-- référence bibliographique non structurée -->
<xsl:element name="em">
<!-- titre -->
<xsl:value-of select="./title"/>
</xsl:element>
<xsl:if test="./biblScope[@unit='volume']">
<!-- numéro du volume s'il existe -->
<xsl:text>, </xsl:text>
<xsl:value-of select="./biblScope[@unit='volume']/@n"/>
</xsl:if>
<xsl:if test="./biblScope[@unit='page']">
<!-- pagination si elle précisée -->
<xsl:choose>
<!-- une page -->
<xsl:when test=".//biblScope[@unit='page']/@n">
<xsl:text>, p. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@n"/>
</xsl:when>
<xsl:otherwise>
<!-- plusieurs pages -->
<xsl:text>, pp. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@from"/>
<xsl:text>-</xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@to"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
<xsl:if test="./biblScope[@unit='entry']">
<xsl:text>, n°</xsl:text>
<xsl:value-of select="./biblScope[@unit='entry']/@n"/>
</xsl:if>
<xsl:if test="parent::witness">
<xsl:text>.</xsl:text>
</xsl:if>
</xsl:template>
<!-- DATATION -->
<xsl:template match="docDate">
<xsl:element name="h1">
<xsl:attribute name="class">
<xsl:text>text-center</xsl:text>
</xsl:attribute>
<xsl:choose>
<!-- dateq de temps et de lieu -->
<xsl:when test="not(contains(placeName,'NS'))">
<xsl:apply-templates select="date"/>
<xsl:text>. — </xsl:text>
<xsl:apply-templates select="placeName"/>
<xsl:text>.</xsl:text>
</xsl:when>
<!-- date de temps uniquement -->
<xsl:otherwise>
<xsl:apply-templates select="date"/>
<xsl:text>.</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:template>
<xsl:template match="date">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="docDate/placeName">
<xsl:apply-templates/>
</xsl:template>
<!-- pas de template pour le premier argument, qui est utilisé pour la page de présentation du corpus -->
<!-- REGESTE -->
<xsl:template match="argument">
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>analyse</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="//sourceDesc/listWit"/>
</xsl:template>
<xsl:template match="argument/p">
<xsl:element name="p">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<!-- IMAGES -->
<xsl:template match="listWit">
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>tradition</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
<xsl:apply-templates select="facsimile"/>
<!-- image de l'acte s'il y a un <facsimile> avec @n contenant l'id de l'acte -->
<xsl:if test="ancestor::TEI//facsimile">
<xsl:element name="div">
<xsl:element name="details">
<xsl:element name="summary">
<xsl:text>Cliquer pour afficher une image de l'acte.</xsl:text>
</xsl:element>
<!-- si un chemin est renseigné dans le @url de <graphic> -->
<xsl:if test="//facsimile/graphic/@url">
<xsl:element name="img">
<xsl:attribute name="src"><xsl:value-of select="//facsimile/graphic/@url"/></xsl:attribute>
<xsl:attribute name="width">100%</xsl:attribute>
<xsl:attribute name="height">auto</xsl:attribute>
</xsl:element>
</xsl:if>
<!-- s'il y a une description de l'image-->
<xsl:if test="//facsimile/graphic/desc">
<xsl:element name="p">
<xsl:value-of select="//facsimile/graphic/desc"/>
</xsl:element>
</xsl:if>
</xsl:element>
</xsl:element>
</xsl:if>
</xsl:template>
<!-- TABLEAU DE LA TRADITION -->
<xsl:template match="//sourceDesc/listWit/witness">
<!-- Plusieurs cas : -->
<xsl:choose>
<!-- Quand il y a une analyse -->
<xsl:when test="@n='analyse'">
<xsl:element name="p">
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:text>Analyse : </xsl:text>
</xsl:element>
<xsl:apply-templates/>
</xsl:element>
</xsl:when>
<!-- Quand il y a une mention -->
<xsl:when test="@n='mention'">
<xsl:element name="p">
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:text>Mention : </xsl:text>
</xsl:element>
<xsl:apply-templates/>
</xsl:element>
</xsl:when>
<!-- Quand il y a un indiqué -->
<xsl:when test="@n='indique'">
<xsl:element name="p">
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:text>Indiqué : </xsl:text>
</xsl:element>
<xsl:apply-templates/>
</xsl:element>
</xsl:when>
<!-- Dans tous les cas -->
<xsl:otherwise>
<xsl:element name="p">
<xsl:apply-templates/>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- APPELS DE NOTE DANS LE TEXTE -->
<xsl:template match="//note[@type='n1']">
<!-- notes critiques -->
<xsl:element name="sup">
<xsl:element name="a">
<!-- @href pour lier l'appel à l'id de la note en fonction de son numéro -->
<xsl:attribute name="href">
<xsl:text>#</xsl:text>
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
</xsl:attribute>
<!-- numéro de la note -->
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
</xsl:element>
</xsl:element>
</xsl:template>
<xsl:template match="note[@type='na']">
<!-- notes paléographiques -->
<xsl:element name="sup">
<xsl:attribute name="id">
<xsl:text>fnref:</xsl:text>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:attribute>
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:text>#fn:</xsl:text>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:attribute>
<xsl:attribute name="rel">
<xsl:text>footnote</xsl:text>
</xsl:attribute>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:element>
</xsl:element>
</xsl:template>
<!-- TEXTE DE L'ACTE -->
<xsl:template match="div[@type='acte']">
<!-- corps de l'acte -->
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>act</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="div[@type='acte']/p">
<xsl:element name="p">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="div[@type='MHT']">
<!-- mention hors teneur (mht) -->
<xsl:element name="p">
<xsl:attribute name="class">
<xsl:text>mht</xsl:text>
</xsl:attribute>
<xsl:choose>
<!-- indication sur le positionnement de la mht -->
<xsl:when test="@subtype='gauche'">
<i style="font-size: small;">(À gauche :) </i><xsl:apply-templates/>
</xsl:when>
<xsl:when test="@subtype='droite'">
<i style="font-size: small;">(À droite :) </i><xsl:apply-templates/>
</xsl:when>
<xsl:when test="@subtype='replidroit'">
<i style="font-size: small;">(Sur le repli, à droite :) </i><xsl:apply-templates/>
</xsl:when>
<xsl:when test="@subtype='repligauche'">
<i style="font-size: small;">(Sur le repli, à gauche :) </i><xsl:apply-templates/>
</xsl:when>
</xsl:choose>
</xsl:element>
</xsl:template>
<xsl:template match="div[@type='sign']">
<!-- signature -->
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>signature</xsl:text>
</xsl:attribute>
<xsl:for-each select="child::p">
<xsl:element name="p">
<i style="font-size: small;">(Signé :) </i><xsl:apply-templates/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:template>
<!-- NOTES CRITIQUES -->
<xsl:template match="//note[@type='n1']/p">
<xsl:element name="p">
<!-- @id, cible du @href de l'appel de note dans le texte -->
<xsl:attribute name="id">
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
</xsl:attribute>
<!-- numéro de la note -->
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
<xsl:text>. </xsl:text>
<!-- texte de la note -->
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<!-- NOTES PALÉOGRAPHIQUES -->
<xsl:template match="note[@type='na']/p">
<xsl:element name="li">
<xsl:attribute name="id">
<xsl:text>fn:</xsl:text>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:attribute>
<xsl:attribute name="class">
<xsl:text>footnote</xsl:text>
</xsl:attribute>
<xsl:element name="p">
<!-- numéro de la note -->
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
<xsl:text>. </xsl:text>
<!-- texte de la note -->
<xsl:apply-templates/>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>

@ -1 +0,0 @@
rm -rf ./actes-princiers/data/02_intermediate/*

@ -1,4 +0,0 @@
output_dir: ~/actes-princiers
project_name: Actes Princiers
repo_name: actes-princiers
python_package: actes_princiers

@ -1,2 +0,0 @@
git remote add data git@gitlab.huma-num.fr:medieval-acts/princely-acts/data.git
git subtree add --prefix actes-princiers/data/01_raw data main --squash

@ -1 +0,0 @@
git subtree pull --prefix actes-princiers/data/01_raw data main --squash

@ -1 +0,0 @@
git remote add data git@gitlab.huma-num.fr:medieval-acts/princely-acts/data.git

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save