add xml pipeline

develop
gwen 3 years ago
parent 97fe81cd5d
commit ed01c1c5a4

@ -29,6 +29,14 @@ Open a terminal in the `actes-princiers`'s folder and launch kedro
`kedro run` `kedro run`
or launch a specific node in the pipeline with:
`kedro run --nodes=preprocess_html`
or a search by tags with:
`kedro run --tags=xsl`
## Visualizing the pipelines ## Visualizing the pipelines
`kedro viz` `kedro viz`
@ -39,21 +47,6 @@ Declare any dependencies in `src/requirements.txt` for `pip` installation.
To install them, run: `pip install -r src/requirements.txt` To install them, run: `pip install -r src/requirements.txt`
## Project dependencies
To generate or update the dependency requirements for your project:
```
kedro build-reqs
```
This will `pip-compile` the contents of `src/requirements.txt` into a new file `src/requirements.lock`. You can see the output of the resolution by opening `src/requirements.lock`.
After this, if you'd like to update your project requirements, please update `src/requirements.txt` and re-run `kedro build-reqs`.
[Further information about project dependencies](https://kedro.readthedocs.io/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)
## tips ## tips
You need to reload Kedro variables by calling `%reload_kedro` in your notebook and re-run the code snippet You need to reload Kedro variables by calling `%reload_kedro` in your notebook and re-run the code snippet
@ -73,3 +66,16 @@ Dans `actes-princiers/.gitignore`,
# ignore everything in the following folders # ignore everything in the following folders
# data/** # data/**
## make a package for deployment
[package based deployment](https://docs.kedro.org/en/stable/deployment/single_machine.html#package-based)
If you prefer not to use containerisation, you can instead package your Kedro project using kedro package.
Run the following in your projects root directory:
kedro package
Kedro builds the package into the dist/ folder of your project, and creates a .whl file, which is a Python packaging format for binary distribution.

@ -1,122 +0,0 @@
# Actes Princiers
## Overview
This is your new Kedro project, which was generated using `Kedro 0.18.10`.
Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.
## Rules and guidelines
In order to get the best out of the template:
* Don't remove any lines from the `.gitignore` file we provide
* Make sure your results can be reproduced by following a data engineering convention
* Don't commit data to your repository
* Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in `conf/local/`
## How to install dependencies
Declare any dependencies in `src/requirements.txt` for `pip` installation and `src/environment.yml` for `conda` installation.
To install them, run:
```
pip install -r src/requirements.txt
```
## How to run your Kedro pipeline
You can run your Kedro project with:
```
kedro run
```
## How to test your Kedro project
Have a look at the file `src/tests/test_run.py` for instructions on how to write your tests. You can run your tests as follows:
```
kedro test
```
To configure the coverage threshold, go to the `.coveragerc` file.
## Project dependencies
To generate or update the dependency requirements for your project:
```
kedro build-reqs
```
This will `pip-compile` the contents of `src/requirements.txt` into a new file `src/requirements.lock`. You can see the output of the resolution by opening `src/requirements.lock`.
After this, if you'd like to update your project requirements, please update `src/requirements.txt` and re-run `kedro build-reqs`.
[Further information about project dependencies](https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)
## How to work with Kedro and notebooks
> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `context`, `catalog`, and `startup_error`.
>
> Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run `pip install -r src/requirements.txt` you will not need to take any extra steps before you use them.
### Jupyter
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:
```
pip install jupyter
```
After installing Jupyter, you can start a local notebook server:
```
kedro jupyter notebook
```
### JupyterLab
To use JupyterLab, you need to install it:
```
pip install jupyterlab
```
You can also start JupyterLab:
```
kedro jupyter lab
```
### IPython
And if you want to run an IPython session:
```
kedro ipython
```
### How to convert notebook cells to nodes in a Kedro project
You can move notebook code over into a Kedro project structure using a mixture of [cell tagging](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html#release-5-0-0) and Kedro CLI commands.
By adding the `node` tag to a cell and running the command below, the cell's source code will be copied over to a Python file within `src/<package_name>/nodes/`:
```
kedro jupyter convert <filepath_to_my_notebook>
```
> *Note:* The name of the Python file matches the name of the original notebook.
Alternatively, you may want to transform all your notebooks in one go. Run the following command to convert all notebook files found in the project root directory and under any of its sub-folders:
```
kedro jupyter convert --all
```
### How to ignore notebook output cells in `git`
To automatically strip out all output cell contents before committing to `git`, you can run `kedro activate-nbstripout`. This will add a hook in `.git/config` which will run `nbstripout` before anything is committed to `git`.
> *Note:* Your output cells will be retained locally.
## Package your Kedro project
[Further information about building project documentation and packaging your project](https://docs.kedro.org/en/stable/tutorial/package_a_project.html)

@ -25,3 +25,11 @@ preprocessed_actors:
save_args: save_args:
sep: ";" sep: ";"
parse_xsl:
type: pandas.XMLDataSet
filepath: data/01_raw/xml/Anjou/anj_is_i_1441_08_05a.xml
preprocess_html:
type: pandas.XMLDataSet
filepath: data/02_intermediate/xml/Anjou/anj_is_i_1441_08_05a.html

@ -0,0 +1,13 @@
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<fileDesc/>
<profileDesc/>
<body/>
</row>
<row>
<fileDesc/>
<profileDesc/>
<body/>
</row>
</data>

@ -0,0 +1,3 @@
"Data Processing pipeline"
from .pipeline import create_pipeline # NOQA

@ -0,0 +1,483 @@
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="yes"/>
<!-- param pour l'id de l'acte -->
<xsl:param name="numero"/>
<!-- STRUCTURATION GLOBALE -->
<xsl:template match="/" >
<!-- bloc du paratexte et du texte
<xsl:apply-templates select="//sourceDesc/listWit"/>-->
<xsl:apply-templates select="//text/body/div"/>
<!-- bloc des notes -->
<div>
<!-- notes critiques -->
<div class="note-global">
<xsl:apply-templates select="//note[@type='n1']/p"/>
</div>
</div>
<!-- notes paléographiques -->
<div class="footnote">
<ol>
<xsl:apply-templates select="//text/body/div//note[@type='na']/p"/>
</ol>
</div>
</xsl:template>
<!-- RÈGLES GLOBALES DE MISE EN FORME -->
<xsl:template match="hi[@rend='sup']">
<!-- exposant -->
<xsl:element name="sup">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="hi[@rend='i']">
<!-- italique -->
<xsl:element name="em">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="hi[@rend='smallcaps']">
<!-- petites majuscules -->
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="ref">
<!-- liens -->
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="@target"/>
</xsl:attribute>
<xsl:attribute name="target">
<xsl:text>_blank</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="head">
<!-- titre des actes -->
<xsl:element name="p">
<xsl:attribute name="class">
<xsl:text>text_etabli</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="biblStruct">
<!-- référence bibliographique structurée -->
<xsl:if test="parent::witness/@n='a'">
<!-- si c'est l'édition a. -->
<xsl:value-of select="parent::witness/@n"/>
<xsl:text>. </xsl:text>
</xsl:if>
<xsl:if test="parent::witness/@n='b'">
<!-- si c'est l'édition b. -->
<xsl:value-of select="parent::witness/@n"/>
<xsl:text>. </xsl:text>
</xsl:if>
<xsl:for-each select=".//author/persName">
<!-- identité du ou des auteurs -->
<xsl:if test="./addName">
<xsl:apply-templates select="./addName"/>
<xsl:text> </xsl:text>
</xsl:if>
<xsl:apply-templates select="./forename"/>
<xsl:if test="./surname">
<xsl:text> </xsl:text>
<xsl:apply-templates select="./surname"/>
</xsl:if>
<xsl:text>, </xsl:text>
</xsl:for-each>
<xsl:choose>
<!-- titre -->
<xsl:when test=".//monogr/title[@level='a']">
<!-- 1/ s'il s'agit d'un article -->
<xsl:text>&#171; </xsl:text>
<!-- titre de l'article entre guillemets français -->
<xsl:apply-templates select=".//monogr/title[@level='a']"/>
<xsl:text> &#187;, dans </xsl:text>
<!-- titre de la revue en italique -->
<xsl:element name="em">
<xsl:apply-templates select=".//monogr/title[@level='j']"/>
</xsl:element>
<xsl:if test=".//biblScope[@unit='part']">
<!-- si la revue est organisée en séries (@part) -->
<xsl:text>, </xsl:text>
<xsl:value-of select=".//biblScope[@unit='part']/@n"/>
</xsl:if>
<!-- s'il y a un numéro de la revue -->
<xsl:if test=".//biblScope[@unit='issue']">
<xsl:text>, n°</xsl:text>
<xsl:choose>
<!-- il s'agit d'un numéro unique -->
<xsl:when test=".//biblScope[@unit='issue']/@n">
<xsl:value-of select=".//biblScope[@unit='issue']/@n"/>
</xsl:when>
<xsl:otherwise>
<!-- le numéro n'est pas unique -->
<xsl:value-of select=".//biblScope[@unit='issue']/@from"/>
<xsl:text>-</xsl:text>
<xsl:value-of select=".//biblScope[@unit='issue']/@to"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:when>
<xsl:otherwise>
<!-- 2/ il ne s'agit pas d'un article : c'est un titre de monographie -->
<xsl:element name="em">
<xsl:apply-templates select=".//monogr/title[@level='m']"/>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
<xsl:text>, </xsl:text>
<xsl:if test=".//imprint/biblScope[@unit='volume']">
<xsl:value-of select=".//imprint/biblScope[@unit='volume']/@n"/>
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:if test=".//respStmt/persName">
<!-- éditeur -->
<xsl:for-each select=".//respStmt/persName">
<xsl:apply-templates select="./forename"/>
<xsl:text> </xsl:text>
<xsl:apply-templates select="./surname"/>
<xsl:if test="position()!= last()">, </xsl:if>
</xsl:for-each>
<xsl:text> (</xsl:text>
<xsl:apply-templates select=".//imprint/respStmt/resp"/>
<xsl:text>), </xsl:text>
</xsl:if>
<xsl:if test=".//pubPlace">
<!-- lieu de publication -->
<xsl:apply-templates select=".//pubPlace"/>
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:if test=".//publisher">
<!-- éditeur -->
<xsl:apply-templates select=".//publisher"/>
<xsl:text>, </xsl:text>
</xsl:if>
<!-- date -->
<xsl:value-of select=".//date/@when"/>
<xsl:text>, </xsl:text>
<xsl:if test=".//biblScope[@unit='page']">
<!-- pagination -->
<xsl:choose>
<xsl:when test=".//biblScope[@unit='page']/@n">
<!-- il y a une page -->
<xsl:text>p. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@n"/>
</xsl:when>
<xsl:otherwise>
<!-- il y a plusieurs pages -->
<xsl:text>pp. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@from"/>
<xsl:text>-</xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@to"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
<xsl:if test=".//biblScope[@unit='entry']">
<!-- la localisation dans la page est précisée (numéro d'index, preuve, appendice et al.) -->
<xsl:text> , n°</xsl:text>
<xsl:value-of select=".//biblScope[@unit='entry']/@n"/>
<xsl:if test="./text()">
<xsl:text> </xsl:text>
<xsl:apply-templates select=".//biblScope[@unit='entry']"/>
</xsl:if>
</xsl:if>
<xsl:if test=".//ref">
<!-- il y a un lien vers une numérisation ou autre -->
<xsl:text> </xsl:text>
<xsl:apply-templates select=".//ref"/>
</xsl:if>
<xsl:if test="parent::witness">
<xsl:text>.</xsl:text>
</xsl:if>
</xsl:template>
<xsl:template match="bibl">
<!-- référence bibliographique non structurée -->
<xsl:element name="em">
<!-- titre -->
<xsl:value-of select="./title"/>
</xsl:element>
<xsl:if test="./biblScope[@unit='volume']">
<!-- numéro du volume s'il existe -->
<xsl:text>, </xsl:text>
<xsl:value-of select="./biblScope[@unit='volume']/@n"/>
</xsl:if>
<xsl:if test="./biblScope[@unit='page']">
<!-- pagination si elle précisée -->
<xsl:choose>
<!-- une page -->
<xsl:when test=".//biblScope[@unit='page']/@n">
<xsl:text>, p. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@n"/>
</xsl:when>
<xsl:otherwise>
<!-- plusieurs pages -->
<xsl:text>, pp. </xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@from"/>
<xsl:text>-</xsl:text>
<xsl:value-of select=".//biblScope[@unit='page']/@to"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
<xsl:if test="./biblScope[@unit='entry']">
<xsl:text>, n°</xsl:text>
<xsl:value-of select="./biblScope[@unit='entry']/@n"/>
</xsl:if>
<xsl:if test="parent::witness">
<xsl:text>.</xsl:text>
</xsl:if>
</xsl:template>
<!-- DATATION -->
<xsl:template match="docDate">
<xsl:element name="h1">
<xsl:attribute name="class">
<xsl:text>text-center</xsl:text>
</xsl:attribute>
<xsl:choose>
<!-- dateq de temps et de lieu -->
<xsl:when test="not(contains(placeName,'NS'))">
<xsl:apply-templates select="date"/>
<xsl:text>. — </xsl:text>
<xsl:apply-templates select="placeName"/>
<xsl:text>.</xsl:text>
</xsl:when>
<!-- date de temps uniquement -->
<xsl:otherwise>
<xsl:apply-templates select="date"/>
<xsl:text>.</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:template>
<xsl:template match="date">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="docDate/placeName">
<xsl:apply-templates/>
</xsl:template>
<!-- pas de template pour le premier argument, qui est utilisé pour la page de présentation du corpus -->
<!-- REGESTE -->
<xsl:template match="argument">
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>analyse</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="//sourceDesc/listWit"/>
</xsl:template>
<xsl:template match="argument/p">
<xsl:element name="p">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<!-- IMAGES -->
<xsl:template match="listWit">
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>tradition</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
<xsl:apply-templates select="facsimile"/>
<!-- image de l'acte s'il y a un <facsimile> avec @n contenant l'id de l'acte -->
<xsl:if test="ancestor::TEI//facsimile">
<xsl:element name="div">
<xsl:element name="details">
<xsl:element name="summary">
<xsl:text>Cliquer pour afficher une image de l'acte.</xsl:text>
</xsl:element>
<!-- si un chemin est renseigné dans le @url de <graphic> -->
<xsl:if test="//facsimile/graphic/@url">
<xsl:element name="img">
<xsl:attribute name="src"><xsl:value-of select="//facsimile/graphic/@url"/></xsl:attribute>
<xsl:attribute name="width">100%</xsl:attribute>
<xsl:attribute name="height">auto</xsl:attribute>
</xsl:element>
</xsl:if>
<!-- s'il y a une description de l'image-->
<xsl:if test="//facsimile/graphic/desc">
<xsl:element name="p">
<xsl:value-of select="//facsimile/graphic/desc"/>
</xsl:element>
</xsl:if>
</xsl:element>
</xsl:element>
</xsl:if>
</xsl:template>
<!-- TABLEAU DE LA TRADITION -->
<xsl:template match="//sourceDesc/listWit/witness">
<!-- Plusieurs cas : -->
<xsl:choose>
<!-- Quand il y a une analyse -->
<xsl:when test="@n='analyse'">
<xsl:element name="p">
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:text>Analyse : </xsl:text>
</xsl:element>
<xsl:apply-templates/>
</xsl:element>
</xsl:when>
<!-- Quand il y a une mention -->
<xsl:when test="@n='mention'">
<xsl:element name="p">
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:text>Mention : </xsl:text>
</xsl:element>
<xsl:apply-templates/>
</xsl:element>
</xsl:when>
<!-- Quand il y a un indiqué -->
<xsl:when test="@n='indique'">
<xsl:element name="p">
<xsl:element name="mark">
<xsl:attribute name="style">
<xsl:text>font-variant: small-caps; background-color: inherit;</xsl:text>
</xsl:attribute>
<xsl:text>Indiqué : </xsl:text>
</xsl:element>
<xsl:apply-templates/>
</xsl:element>
</xsl:when>
<!-- Dans tous les cas -->
<xsl:otherwise>
<xsl:element name="p">
<xsl:apply-templates/>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- APPELS DE NOTE DANS LE TEXTE -->
<xsl:template match="//note[@type='n1']">
<!-- notes critiques -->
<xsl:element name="sup">
<xsl:element name="a">
<!-- @href pour lier l'appel à l'id de la note en fonction de son numéro -->
<xsl:attribute name="href">
<xsl:text>#</xsl:text>
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
</xsl:attribute>
<!-- numéro de la note -->
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
</xsl:element>
</xsl:element>
</xsl:template>
<xsl:template match="note[@type='na']">
<!-- notes paléographiques -->
<xsl:element name="sup">
<xsl:attribute name="id">
<xsl:text>fnref:</xsl:text>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:attribute>
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:text>#fn:</xsl:text>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:attribute>
<xsl:attribute name="rel">
<xsl:text>footnote</xsl:text>
</xsl:attribute>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:element>
</xsl:element>
</xsl:template>
<!-- TEXTE DE L'ACTE -->
<xsl:template match="div[@type='acte']">
<!-- corps de l'acte -->
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>act</xsl:text>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="div[@type='acte']/p">
<xsl:element name="p">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="div[@type='MHT']">
<!-- mention hors teneur (mht) -->
<xsl:element name="p">
<xsl:attribute name="class">
<xsl:text>mht</xsl:text>
</xsl:attribute>
<xsl:choose>
<!-- indication sur le positionnement de la mht -->
<xsl:when test="@subtype='gauche'">
<i style="font-size: small;">(À gauche :) </i><xsl:apply-templates/>
</xsl:when>
<xsl:when test="@subtype='droite'">
<i style="font-size: small;">(À droite :) </i><xsl:apply-templates/>
</xsl:when>
<xsl:when test="@subtype='replidroit'">
<i style="font-size: small;">(Sur le repli, à droite :) </i><xsl:apply-templates/>
</xsl:when>
<xsl:when test="@subtype='repligauche'">
<i style="font-size: small;">(Sur le repli, à gauche :) </i><xsl:apply-templates/>
</xsl:when>
</xsl:choose>
</xsl:element>
</xsl:template>
<xsl:template match="div[@type='sign']">
<!-- signature -->
<xsl:element name="div">
<xsl:attribute name="class">
<xsl:text>signature</xsl:text>
</xsl:attribute>
<xsl:for-each select="child::p">
<xsl:element name="p">
<i style="font-size: small;">(Signé :) </i><xsl:apply-templates/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:template>
<!-- NOTES CRITIQUES -->
<xsl:template match="//note[@type='n1']/p">
<xsl:element name="p">
<!-- @id, cible du @href de l'appel de note dans le texte -->
<xsl:attribute name="id">
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
</xsl:attribute>
<!-- numéro de la note -->
<xsl:number count="//note[@type='n1']" level="any" format="1"/>
<xsl:text>. </xsl:text>
<!-- texte de la note -->
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<!-- NOTES PALÉOGRAPHIQUES -->
<xsl:template match="note[@type='na']/p">
<xsl:element name="li">
<xsl:attribute name="id">
<xsl:text>fn:</xsl:text>
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
</xsl:attribute>
<xsl:attribute name="class">
<xsl:text>footnote</xsl:text>
</xsl:attribute>
<xsl:element name="p">
<!-- numéro de la note -->
<xsl:number count="//text/body/div//note[@type='na']" level="any" format="a"/>
<xsl:text>. </xsl:text>
<!-- texte de la note -->
<xsl:apply-templates/>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>

@ -0,0 +1,26 @@
import pandas as pd
from lxml import etree
from pathlib import Path
# path and file configuration
_here = Path(__file__).resolve().parent
xsl_stylesheet = _here / "actes_princiers.xsl"
def parse_xsl(xmldoc: pd.DataFrame) -> pd.DataFrame:
# source_doc = etree.fromstring(xmldoc.to_xml())
## xmlstring = xmldoc.to_xml()
## source_doc = ET.fromstring(xmlstring)
## source_doc = etree.parse(to_xml)
# # removing namespace :
# query = "descendant-or-self::*[namespace-uri()!='']"
# for element in source_doc.xpath(query):
# #replace element name with its local name
# element.tag = etree.QName(element).localname
# etree.cleanup_namespaces(source_doc)
# xslt_doc = etree.parse(str(xsl_stylesheet))
# xslt_transformer = etree.XSLT(xslt_doc)
# output_doc = xslt_transformer(source_doc)
# return pd.read_html(output_doc)
return xmldoc

@ -0,0 +1,17 @@
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import parse_xsl
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=parse_xsl,
inputs="parse_xsl",
outputs="preprocess_html",
name="preprocess_html",
tags="xsl",
),
]
)

@ -1,3 +1,5 @@
lxml>=4.6.3
python-slugify>=8.0.1
black~=22.0 black~=22.0
flake8>=3.7.9, <5.0 flake8>=3.7.9, <5.0
ipython>=7.31.1, <8.0; python_version < '3.8' ipython>=7.31.1, <8.0; python_version < '3.8'

Loading…
Cancel
Save