Virtual Dhamma-Vinaya Vihara

Studies, projects & library - [Studium, Projekte & Bibliothek] (brahma & nimmanarati deva) => Translation projects - [Übersetzungsprojekte] => Studygroups & Dhamma Dana - [Studiengruppen & Dhamma Dana] => Zugang zur Einsicht - [Access to Insight] => Topic started by: Dhammañāṇa on June 24, 2018, 10:22:48 PM

Title: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on June 24, 2018, 10:22:48 PM
Topic is dedicated for converting uns editing issues in regard of the CSCD xml files to suitable rendering for ati.eu

Title: Re: [Dokuwiki] converting, editing CSCD xml to ati.eu format
Post by: Moritz on June 25, 2018, 10:40:24 PM
I am currently converting the original CSCD Tipitaka XML files (from tipitaka.org/romn (http://tipitaka.org/romn), tipitaka.org/khmr (http://tipitaka.org/romn), etc.) to UTF-8.

This takes some time. The original files have different character sets: sometimes UTF16LE, sometimes UTF16BE.

Finished files for now:

All index (TOC) files for Roman, Khmer, Thai, Cyrillic: tipitaka-indexes-utf8.zip (452 KB) (http://forum.sangham.net/index.php?action=tpmod;dl=get71)

All Roman files (including indexes / TOC files): tipitaka-romn-utf8.zip (35.6 MB) (http://forum.sangham.net/index.php?action=tpmod;dl=get698)

I will add them as additional files to the existing CSCD Tipitaka (http://forum.sangham.net/index.php?action=tpmod;dl=item71) upload.

(Forum topic: [Pali] Chaṭṭha Saṅgāyana CD published by the Vipassana Research Institute (http://forum.sangham.net/index.php/topic,545.msg1474.html#msg1474))

Khmer, Thai, Cyrillic are being processed.

 *sgift*



... oops   :-|

I accidentally overwrote the "main" download (the huge CSCD Tipitaka CD image), instead of adding the XML index archive as an additional file.

One could restore this from some backup for sure. But I actually think this is good, because it frees a lot of space and I don't think the CD is useful for anybody. (The data on it is not directly accessible in any editable way like simple XML/HTML/plain text, and I think it can only be used with a Windows 95/98 computer.)

So... either we could try to restore this old CD upload, or it might be better to simply adjust the description: that it contains now the Tipitaka XML files instead of the CD.

 :-|  ^-^
Title: Re: [Dokuwiki] converting, editing CSCD xml to ati.eu format
Post by: Dhammañāṇa on June 25, 2018, 10:46:18 PM
Sadhu

My person does not think that restore of older is necessary, Nyom Moritz .
Title: Re: [Dokuwiki] converting, editing CSCD xml to ati.eu format
Post by: Moritz on June 25, 2018, 11:02:10 PM
Good.  :)

Khmer and Thai are finished, Khmer being uploaded now. Will add the others later when they are finished..

/me So now I really have to let my fingers off all of this stuff for a few days and continue with my worldly bread giving work.

_/\_
Title: Re: [Dokuwiki] converting, editing CSCD xml to ati.eu format
Post by: Dhammañāṇa on June 25, 2018, 11:17:02 PM
Sadhu

/me : Sure. it's a beautiful spend of / giving of time and best to leave when the mind is great lifted.
Title: Re: [ATI.eu] converting, editing CSCD xml to ati.eu format
Post by: Dhammañāṇa on June 26, 2018, 06:56:29 PM
/me : Nyom Moritz  , Atma has seperated the files and moved all three to "Sanghadana" since the xml and tipitaka org pages are not without limitation and "just" as Sangha gift given. As for the old CSCD file, that was the public CD-solftware not the tripitaka xmls, which might be gone, so far here. The text of the previous download and discription, Atma keeped in the now called CSCD tipitaka-indexes-utf8 (http://forum.sangham.net/index.php?action=tpmod;dl=item71) download. The "lost" CD mirror can be downloaded here http://www.tipitaka.org/cscd3.iso.zip , for now not given and avaliable here for the Sangha.
Title: Re: [ATI.eu] converting, editing CSCD xml to ati.eu format
Post by: Dhammañāṇa on June 28, 2018, 02:18:09 PM
Atma gerade versuch "tipitaka-romn-utf8.zip" zu entpacken, nach download, doch scheint es nicht mögluch zu sein. Vielleicht liegt es an dem Zip-Format, Nyom Moritz . Nur um bescheid zu geben.
Title: Re: [ATI.eu] converting, editing CSCD xml to ati.eu format
Post by: Moritz on June 28, 2018, 07:35:35 PM
Atma gerade versuch "tipitaka-romn-utf8.zip" zu entpacken, nach download, doch scheint es nicht mögluch zu sein. Vielleicht liegt es an dem Zip-Format, Nyom Moritz . Nur um bescheid zu geben.
Keine Ahnung, woran das liegt. Habe hier keine Probleme damit, gerade heruntergeladen und entpackt.

Vielleicht Übertragungsfehler beim Download, dass die Datei kaputt ist. Eventuell noch mal versuchen. Was gibt es denn für eine Fehlermeldung?

Da die Datei relativ groß ist, vielleicht besser, sie in kleinere Portionen zu stückeln. Werde ich später versuchen, wenn ich Zeit finde.

_/\_
Title: Re: [ATI.eu] converting, editing CSCD xml to ati.eu format
Post by: Dhammañāṇa on June 28, 2018, 09:12:13 PM
Hat nun geklappt (seltsam, das erster Upload 50MB hatte... wie auch immer, dürfte ein Downloadfehler gewesen sin, Nyom Moritz .
Title: Re: [ATI.eu] converting, editing CSCD xml to ati.eu format
Post by: Moritz on July 01, 2018, 10:26:52 AM
Thai und Kyrillisch sind nun auch hochgeladen und in die Download-Liste (http://forum.sangham.net/index.php/topic,8675.msg15105.html#msg15105) aufgenommen.

Ich könnte wohl auch alle Einzeldateien auf ATI hochladen, allerdings bin ich mir noch nicht über die beabsichtigte Struktur im Klaren.

Ich sehe, Sie haben zum Beispiel Namensräume cs:rm (http://www.accesstoinsight.eu/doku.php?id=index&idx=cs-rm) ("Chaṭṭha Saṅgāyana Roman"?) und pi:rm (http://www.accesstoinsight.eu/doku.php?id=index&idx=pi-rm) ("Pali Roman"?) angelegt und schon mit den XML-Dateien bevölkert.

Es sieht so aus, dass cs:rm (http://www.accesstoinsight.eu/doku.php?id=index&idx=cs-rm) die vollständige Variante ist. Soll diese Struktur so für alle Zeichensätze übernommen und beibehalten werden?

Falls ein paar Regex-Ersetzungen schon klar im Vorfeld parat, die vor dem Upload schon erledigt werden können, könnte ich das auch gleich tun.

_/\_
Title: Re: [ATI.eu] converting, editing CSCD xml to ati.eu format
Post by: Dhammañāṇa on July 01, 2018, 02:00:58 PM
Sadhu , Nyom Moritz .

cs-rm Struktur und Inhaltstabellen sollten im Großen so passen. Die Benennung hat Atma noch nicht fertig in den Verzeichnissen.

Atma denkt, da alles sehr komplex ist, ist es gut, wenn er rm soweit fertig macht und versucht einen Standard aufzuzeichnen.

Die Umwandlung der xml Texte wird im Wesentlichen gleich werden, wie zuvor in html, classen und divisionen und spans für WRAP passend. Herausforderung werden wie immer die Anker für die einzelnen Suttas sein.

Atma hatte vorgehabt die Verzeichnisse deshaln zuerst komplet zu machen.

Etwa so wie http://accesstoinsight.eu/doku.php?id=cs-rm:tipitaka:sut:mn:index
Unter jedem Vagga kommt dann noch Liste der einzelnen Sutta.

- Mūlapariyāyasuttaṃ [[:sut.mn.v1#sut.mn.001|sut.mn.001]]
- Sabbāsavasuttaṃ [[:sut.mn.v1#sut.mn.002|sut.mn.002]]
- ...

Code: nicht mehr so aktuell, siehe ati.eu Seite http://accesstoinsight.eu/doku.php?id=de:implementation_cscd - Einarbeitungen des CS-Tipitaka [Select]
< p rend="centre"> Namo tassa bhagavato arahato sammāsambuddhassa< /p>in

<div wrap_center>...</div>


< p rend="nikaya">... in

====== Name ======
<span wrap_suttaid #sut.mn>sut.mn</span>

< p rend="book">... in

====== Name ======
<span wrap_suttaid #sut.mn.p1>sut.mn.p1</span>

< p rend="chapter">... in

===== Name =====
<span wrap_suttaid #sut.mn.v1>sut.mn.v01</span>

< p rend="subhead">... in

==== Name ====
<span wrap_suttaid #[i]sut.mn.001[/i]>[b]sut.mn.001[/b]</span>

< p rend="bodytext" n="1"><hi rend="paranum">1</hi><hi rend="dot">.</hi>...< /p> in:

<span wrap_bodytext #s001><span wrap_para>[b]1.[/b]</span>...</span

Anker wie <pb ed="M" n="1.0001" /> in:

<span wrap_parasub #M_1.0001></span>

<note>...</note> in ((...)) um Fußnote zu bekommen.

< p rend="gatha1">...< /p> usw. in <span wrap_gatha1>...</span>

Am Ende Datatable

Name: ...
File-name: ...
Url: ...
Source: www. tipitaka.org/romn/cscd/s0201m.mul0.xml
...Date, Owner, Dedication


Im Großen könnte man sich an die Codierung aufgebaut im Khmer-Tipitaka halten, mit Korrekt div kleiner Fehler der Konsistenz, vaggas, subvaggas... was ja bis Dhp "fertg ist, Sinngemäß weiter.

Desto gewifter man es aufbaut, desto einfacher dann mit div. plugins und Kreuzreferenzen in alle sprachen und Ausgaben. Indexe automatisch aufbauen...

Wenn Nyom, was Struktur für leichtes händeln betrifft, etwas sieht, sich dessen annehmen möchte: freier Raum.

Handarbeit, denkt Atma, sind nach wie vor die Anker zu den einzelnen Suttas und das umbenennen der einzelnen xml files (das Atma ja schon fertig hatte, doch dann utf-16 entdeckte...).

Es ist gut, wenn die online-Version als Original gilt. Mit txt-file und regex direkt, dann ja sehr angenehm und multi-user-fähig und multi-tasking-fähig

Atma hatte alle files lokal umbenannt, wie "sut.mn.v01.txt"... "sut.an.01.v01.txt" usw. Atthakata, Anya, Tika, so gut wie möglich passend wo passen "sut.mn.v01_att.txt" (siehe Verzeichnisstruktur online)

Atma würde sich auf die bereits hochgeledenen Tables konzentrieren, und dort weiter machen, wie dn, mn. Aber flexible (an Arbeit ja freie Auswahl zu Genüge)

Atma hat eine Seite angelegt: Einarbeitungen des CS-Tipitaka (http://accesstoinsight.eu/doku.php?id=de:implementation_cscd), möchte dort alles geordnet anführen und dazu anregen.

Soweit mal im Groben und sicher nicht fehlerfrei ein "Grund-Pflichtenheft".


Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 04, 2018, 06:35:18 PM
Verzeichnis(se) Tipitak (Mula) (http://accesstoinsight.eu/doku.php?id=cs-rm:tipitaka:index) soweit fertig, und die Anleitung, Einarbeitungen des CS-Tipitaka (http://accesstoinsight.eu/doku.php?id=de:implementation_cscd#struktur_ordner_und_file-namen), etwas aktualisiert.

(Wenn da etwas gemacht werden möchte. Sicher mit Verbesserungsauge und Gewiftheit auch gut verbesserbar.)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 07, 2018, 07:40:43 AM
Index Aṭṭhakathā (http://accesstoinsight.eu/doku.php?id=cs-rm:atthakatha:index) und file-Benennung sollte nun auch passen.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 08, 2018, 02:27:29 PM
Index Tikā (http://accesstoinsight.eu/doku.php?id=cs-rm:tika:index) as well. Anya might need some other 3-4h.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 09, 2018, 07:44:38 AM
Nearly finished the tables.

Now just thinking about the possibilities, thinking also on making it most flat maybe.

Question, Nyom Moritz

Is it "automatical" possibel to rename the xml files according a list Atma could provide xxx.xml -> =yy3.txt, xx2.xml -> = yy4.txt...

And another question: Would it be possible to fetch a text under a certain tag, say "chapter" and rename files according their item under this tag? (problem the used titel is sometimes "chaper", and if not having seemingly "book" as well as multiuse, think just on mahavagga and other aspekts, but would be perfekt for wikiuse and search)

It's good to think well about the structur. All infos, name, book, file, url, origin, and credits ... would be good to carry in a/the datatable of each file.

What will be avaliabe maybe this evening ist a full list: "code" - "cscd-title" - "file-name" - (path) in addition to handle names and Datas of files.

Usual direct search is either by common code name or title. Maybe a combi or such as alias would be of cause nicer.

The whole need a lot of fine adjusting in addition, to find best also corresponding in attha, tika, anya... which Atma tried with codes as far as possible.
Havn't found a larger consistency over the files yet, and even the naming seems to have been changed in the way of progress, finding commentary-names under tipitaka and so on.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on July 09, 2018, 05:41:59 PM
Vandami, Bhante _/\_

Is it "automatical" possibel to rename the xml files according a list Atma could provide xxx.xml -> =yy3.txt, xx2.xml -> = yy4.txt...

Yes.

Quote
And another question: Would it be possible to fetch a text under a certain tag, say "chapter" and rename files according their item under this tag?

On principle, that would be possible, yes. It seems even possible to use Khmer alphabet or other different than Roman in the name and URL: ២០._ផុស្សពុទ្ធវំសោ (http://www.accesstoinsight.eu/doku.php?id=playground:%E1%9F%A2%E1%9F%A0._%E1%9E%95%E1%9E%BB%E1%9E%9F%E1%9F%92%E1%9E%9F%E1%9E%96%E1%9E%BB%E1%9E%91%E1%9F%92%E1%9E%92%E1%9E%9C%E1%9F%86%E1%9E%9F%E1%9F%84).
(I hope there are no hidden "dangers" of certain errors occurring when using non-western UTF-8 alphabets somewhere inside the Wiki titles.)

But maybe it would be easier to find the corresponding versions for each alphabet/script and switch between them when they are all named the same for each language?
Like it is possible at the moment with the language switch:
(https://forum.sangham.net/index.php?action=dlattach;topic=8672.0;attach=5914)
I think that is only possible if the files are all named in the same way for each language/script.

Maybe good if the base name for each file would be left as it is, corresponding to some code, like the XML files "e1208n.nrf0.xml" etc., but one could still add additional names, corresponding to book or chapter in the .htaccess rewrite rules (https://www.dokuwiki.org/rewrite).
Not sure if it is possible to set so many rules and if it could drastically reduce performance (https://stackoverflow.com/questions/681810/how-many-rewriterules-can-you-have-in-htaccess-without-trouble) when having ten thousands of rules. ^-^


Quote
(problem the used titel is sometimes "chaper", and if not having seemingly "book" as well as multiuse, think just on mahavagga and other aspekts, but would be perfekt for wikiuse and search)

For each language, in the '/cscd' directory there are
2915 XML files
217 of them TOC/index files (ending in .toc.xml)

2651 occurrences of "chapter"
218 occurrences of "book"

So probably for each TOC there is one book.
(There is another "tipitaka.toc.xml" in the directory above "/cscd", probably corresponding to another "book" tag for the whole Tipitaka.)

Maybe this info is useful somehow. I could try to find out other numbers and structures later.

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 09, 2018, 06:40:32 PM
Sadhu

My person thinks that the/a sutta/vagga code as pagename is fine. Also raised a question in regard of multilingual here ()[ATI.eu] Multilingual - one "id" (namespace), diff. Names/Title languages/script (http://forum.sangham.net/index.php/topic,8697.0.html). Name the "files in different languages/different does not make sense since there are directories/root-namespaces for each. The only

Chaper, book and actually title have no real consistence over all files. Might be that certain sections have and as knowing the files a little, subchaper, chaper ... are not so sure over all. Not to speak about the "chaotic", since many compilers, commentaries.

TOC-files just build the certain directory/folder, tree with its files on the last instance/level.

At the moment the new names look like this for a vagga for example:

mula: cs-rm:(path tipitaka:sut:an:)an01.v1{anchor #s001}
attha: cs-rm:(path tipitaka:sut:an:)an01.v1_att{anchor #s001}
tika: cs-rm:(path tipitaka:sut:an:)an01.v1_tik{anchor #s001}

Indexes have been modified, put together, such es Jataka 1,2,3 or reduced by one level, like for dn or mn.

if now adding title of file (most vagga), the search is quick and easie. As long english, since search engine and other features use primary pagenames.

My person came across a plugin for indexes for the sidebar which uses the first headline (h1) of each file. If all features adress in such or similar way, act on an alias from the file it self, all is fine, yet of course work.

Having the pagenames and structur as well as titel/alias well, all chategories/folders can fall aside of the lang-level and the root-levels, like tipitaka, atthakatha, library, author,ptf... (at the time it's also corresponding to each lang build, how ever, crosslinking is hard since requires to know sometimes a path of 3-4 levels). Having a flat structur makes all very easy.

Great to know that such fetching and renaming is possible. It might need another while till all is firm. Atma might ask some questions in progress of breeding handable, simple visions and might then try to transport it well for possible actions at large.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 12, 2018, 12:10:17 AM
Anya (http://accesstoinsight.eu/doku.php?id=cs-rm:anya:index) has been finished as well now.

The four great index should now have all files matched and each got a pagename.

It might be, that, since Tika and Anya is really not an ease, that there are double pagenames, so in this regard maybe a check if wishing to use it for a large addjusting.

Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 12, 2018, 04:44:54 PM
Nyom Moritz,

Attached a list of the whole index cs-rm-namespace as my person thought it till here.

Tree structur is equal other lang-namespaces build.

It includes the names of the indexes, title, pagename, path, cscd-file name

Thought till here was to rename the xlm file into the listed pagenames, coverting them before according implementation_cscd (http://accesstoinsight.eu/doku.php?id=de:implementation_cscd) in regard of wiki/html code (wrap) and anchors.

h1-Title might be good being the same like title

Than there have been strong considerations to make the structur total flat which would require to rename the "index" file in proper code pagenames, after of cause the same for all othe lang-namespaces, e.g. ati "index-files" would be good to get the same codes with "_ati" attached, single files as well "_{....}" (translator) attached.

No, on a dynamic page: does not makes sense and horror in maintaining indexes and name code systems. The ATI tree modified like now is fine. Maybe just looking for keeping the name index free for automatical folder indexes via a plugin. At least the sidemap is wonderful for quick finding and teaches/trains sati, flat is just by search engine good accessable.

So far the thought and state of progress. Atma thought to go on with the restyling of the tags and files from ati.

If thinking on different ways, it's just an idea of mine and there might be better, so don't feel limited by it.

(it might be that there are some double name conflicts in the indexlist for flat structur and renaming, and not checked if all files are matched. 1 or to are not listended, as the contained only the name of a group)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on July 13, 2018, 01:19:36 AM
Vandami, Bhante _/\_

Sadhu! So just to make sure about plans to proceed:

As I understand, "readable" Latin script names/codes were used for the pages and namespaces, including also names from commentaries like: "cs-rm:anya:visuddhimagga:11._samadhiniddeso" whenever there are areas in the commentaries which do not correspond 1-to-1 to certain Tipitaka books/vaggas/pages.

If this structure is now clearly defined, it seems it would be good to use the same structure and pagenames/namespaces for all other scripts (Khmer, Thai, ...) as well.

So: Roman codes/pagenames for all scripts, in order to be able to use the language switch between different scripts.

I assume this like Bhante had in mind as well?

It could take some time (a week or more) till I can get to it, but if the names and structure of tables of content are all clearly defined in this way, I think I could import the tables of content with the same structure for all remaining scripts without much trouble.

(And for later at some point maybe: As mentioned before (http://forum.sangham.net/index.php/topic,8672.msg15257.html#msg15257), might be possible to convert namespaces to use other scripts with .htaccess rules or some other tricks. Maybe even possible to switch between very different looking names with the langauge switch, with help of some JavaScript.)

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 13, 2018, 08:23:22 AM
Nyom Moritz

All namespaces in latin scripts, yes, otherwise only troubles and the whole translation tools and lang-namespaces would be of no use at all.

The naming of folders and files now might be not perfect, such as double names, but such would be clear if starting to rename.

Other, ideologic renderings of single names can/could be made later by hand, step by step, online.

For your easy rendering, its possible good to put the files in the tree folder before, since it need to be made by hand and if not done with the root files it might be of more work, but probably the same for each lang. (on this place: anya filenames have no _any at the end and other files than the tipitaka-codes do not include the path in there name when it comes to tika and anya and simply new names. Maybe something that my person should change since it is difficuld to put them into the right folders without such a sort/search possibility in an explorer)
But if that things would not trouble to much, let it be like that for now. As Morits feels inspired to organice.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on July 13, 2018, 09:52:09 AM
Sadhu. Thanks for the hints and explanations.

Quote
(on this place: anya filenames have no _any at the end and other files than the tipitaka-codes do not include the path in there name when it comes to tika and anya and simply new names. Maybe something that my person should change since it is difficuld to put them into the right folders without such a sort/search possibility in an explorer)

It seems some deeper nesting of indexes would be good. Some things are confusing.

For example, the index/TOC:
cs-rm:anya:niti-gantha-sangaho:index (http://www.accesstoinsight.eu/doku.php?id=cs-rm:anya:niti-gantha-sangaho:index)
contains this, which is also an index/TOC:
cs-rm:anya:niti-gantha-sangaho:caturarakkhadipani (http://www.accesstoinsight.eu/doku.php?id=cs-rm:anya:niti-gantha-sangaho:caturarakkhadipani)
within the same directory/namespace (cs-rm:anya:niti-gantha-sangaho (http://www.accesstoinsight.eu/doku.php?id=cs-rm:anya:niti-gantha-sangaho)).
And from there there are links to actual texts, like cs-rm:anya:niti-gantha-sangaho:kayapaccavekkhana (http://www.accesstoinsight.eu/doku.php?id=cs-rm:anya:niti-gantha-sangaho:kayapaccavekkhana), also in the same diretory.
I think it would be good if for each TOC there would be another level/directory.

If including the path in the final page name as well then of course the final name could be very long with deep directories, like

cs-rm:anya:niti-gantha-sangaho:caturarakkhadipani:anya:niti-gantha-sangaho.caturarakkhadipani.kayapaccavekkhana_any (http://www.accesstoinsight.eu/doku.php?id=cs-rm:anya:niti-gantha-sangaho:caturarakkhadipani:anya:niti-gantha-sangaho.caturarakkhadipani.kayapaccavekkhana_any), or even only the filename without namespace niti-gantha-sangaho:caturarakkhadipani:anya:niti-gantha-sangaho.caturarakkhadipani.kayapaccavekkhana_any (http://www.accesstoinsight.eu/doku.php?id=cs-rm:anya:niti-gantha-sangaho:caturarakkhadipani:anya:niti-gantha-sangaho.caturarakkhadipani.kayapaccavekkhana_any) could look very strange on the sitemap as well.

Or maybe just leave out such indermediary TOCs/indexes like cs-rm:anya:niti-gantha-sangaho:caturarakkhadipani (http://www.accesstoinsight.eu/doku.php?id=cs-rm:anya:niti-gantha-sangaho:caturarakkhadipani) which is already completely included in another bigger index file in the same directory.

So that there would be no "caturarakkhadipani" appearing in the final path, "caturarakkhadipani" being simply part of the one big "index".

Not sure if I understand this correctly:
Quote
Maybe something that my person should change since it is difficuld to put them into the right folders without such a sort/search possibility in an explorer
Why is it helpful to have the complete pathname also in the filename (separated with '.')? From my perspective it just produces very unnecessarily long filenames. But this is a problem with the tablet explorer software? (Don't really know what Bhante is using now.)

I could rename all files to not path-including filenames and simply put them in their "right" deeper directories if this seems helpful, (making a deep hieararchy everywhere, but with short filename in the end), but no time before next week.


(Not necessary to answer all in detail now. May Bhante find enough rest in between. I have no time to come back to this before next week.)

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 13, 2018, 01:43:38 PM
Useally my person does things 2, 3, 4 times from the begining again. Vision in mind, then still lacking this or that, yet not perfect.

The second question is particulary reflected in the first. And if having later 2100 file and wishing to put them on their places, morits would see why.

And thats also for the brain. Some might know mn001 , that it is in the Suttapitaka, in the first vagga. But if having a file called bhu001 one would have problems.

Now one could give them a real name only. karanayametta sutta So, knowing the name, would you know which pitaka, nikaya, vagga and subvagga it belongs?

Therefor both useful systems, that of "modern" codes from ATI (western focus is on suttas and ends there) and the tree from the edlers by names.

that is why there came {pitaka}.{nikaya}.{vagga}.({subvagga}).{sutta no.} as for the filename into being.

If searching for an01.001, by the surfix _{att/tik/any} on matches them well in the preview putting the letters into the search box. On the other side, if searching via sidemap it's fine as well.

This works all fine till atthakatha Abhidhamma and parts of tika. When coming to anya it's no more that clear executeable and Anya it self contains already double and tripple naming. A certain collection has the first book and the first chaper with same name containing things not clear a counterpart of the tipitaka.

Till today, and actually having spend 100's of hour on trying to sort in fine, may person came till abouf jataka to be sure that the system would not run ugly of build on a not suitable structur.

Now this here, my person guesses, since not even abhidhamma (horrible structur) has been sorted well in the west, is the first time after tipitaka.org (which used a simple but not asumesable code and indexssystem for a stabil not dynamic storing, yet hard to find anything if not a little familar) that the whole heritage of the Sanghayana get's sorted.

It's all looked simple for my person as well. Then after you developed structure for the fist and second level, after the 10 file you match a new vagga/subvagga structur... Since from jakata on there is since longer no much broad interest, Anya is like the book shelf in a studend room and not like a chemist register.

Practical Anya:

Caturārakkhadīpanī is a book under the collection Nīti-gantha-saṅgaho in Anya-Commentaty and contains serial book. Within the is the Caper Kāyapaccavekkhaṇā which is the actual file (pagename)

To come to it one follows the indexes (pagenames under the namespace tree calked index) one after another or more direct, since the fist index contains already the whole structur. Thats right, sub indexes are not really necessary if the fist already contains the whole.

So it has more practical reasons. For example think on an, mn, iti. If knowing the system of the Sanghayana one knows that iti is a subvagga of kn. Same counts danger counts for mn. there is no mn123 in the sanghayana edition. It came from many people focusing on a certain levels "gemeinsames vielfaches).

Now, for example, if on works out Visudhimagga the first book, he might expand the capters index and if finished, might copy it into the index of visudhimagga, even to the anya index. Another might work from another level...

It means it has been the result of practical work in the worst situation of knowing the whole of particalar parts. Since it will stay dynamic, the further levels indexes have been not deleted (like cscd) but serve 2 purposes easy to acces in both directions, on which level ever one might enter, and to focus on a scale suitable to ones concentration and reminding and then put it together upwardly, downwardy.

Thats why this system from of pitaka, nikaya, vagga has been keep here as well and the structure is either by name flat (aside anya for all files) but also physical in levels, not only presented by a digital tree like the xml in cscd. If looking on the flat system of cscd one will fine att file codes in the tipitaka and so on. Meaning that even this simple system runned out ugly after finding out detail from the elders.

Further, the middle placed indexes are thought to get enriched by single chapers within one file (anchor content #v1...v5) and the deepes would contain later also the single suttas links (anchor #s001...s057) of the files. Meaning getting a zoom level by level. For one index that becomes to large.

But as told, open to others as well. Just knowing that it can serve for "headage" for weeks and month, yet next day finding out... "ok, again from the beginning". It's like doing/training Jhana, mastering the worlds.  :) That is why doing = sacrify has it's benefit = having learned a skill.

So know that Nyom has to structer something that probably nobody knows as a whole in it's various details and structur. It needs to be open in that far and "nachvollziehbar" for others, as well as accessable for people coming from differen learnsystems. West does not know the way of the elders and elders do not know the code-thinking of western.

For example look at ATI where it ends and beginns to go astray of suttacentral, having trouble with vinaya and abhidhamma and possible no logical way to ever add the commentaries, yet references to brahmic text from nepal.

But as told, while knowing that even some parts of the suttapitaka in the tipitaka have to be chanced, it can not be expected to be perfect or to work out to be perfect before putting it into the shelf.

If particular names have to chanced late on, if the is no double naming on the pagename level, such can be made by steps online if the whole structur has certain consistence as a whole.5z

Things open to do at this point if wishing to do it in a larfe scale:

- proof and eventually correcting indexes and names of files
- renaming of files
- converting into wiki/wrap standard
- implementing anchors to the single suttas, chapters
- incl. Data table to each file (titel, url, date, origin...)
- upload into the folders (or incl folders) in the single lang-namespaces

So it's really open how one like to do it, but its not really a quick job to develop such, at least for my persons limits.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on July 13, 2018, 09:28:32 PM
a Haa... that solves much (one each page has its heading for friendly displa Configuration Setting: useheading (https://www.dokuwiki.org/config:useheading) and IndexMenu Plugin (https://www.dokuwiki.org/plugin:indexmenu) for not needing to edit the indexes (possible, just would not give corresponding accesslink to the public tipitaka.org pages). Atma installs it for testing, assuming its welcome and given.

A sample of this index is now put on http://accesstoinsight.eu/doku.php?id=cs-rm:index#index

The use heading opinion nicely displays the title names now, will filenames (in code) are matched as fine selection as when typed into the search box.

Still a combination of both to display would be fine.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on September 16, 2018, 10:10:02 PM
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on September 16, 2018, 10:58:21 PM
Configuration Setting: fnencode (https://www.dokuwiki.org/config:deaccent) + Configuration Setting: deaccent (https://www.dokuwiki.org/config:deaccent) + 5 Pali-aufwartungen + ati-alt + Wörterbuchautoren + 100 IT-Überraschungen + Wetter/Körper + 2 Jahre alt: Battery u. kl. Tablet... + keinerlei Bildung in sprachen incl. IT + Riesen "Vogel" ... + :-\ = total Verrückt

so und nun weiter, da wo gerade, und nach 30x nochmal machen wird's passen, so neben sañña, nicht auch noch saṅkhāra nicht sicher ist, neben den anderen Aggregaten involviert.

... oder mal wieder eine Nacht darüberschlafen... und auf Hilfsmaschinen und Wissenschaft(ler) hoffen. Aber etwas schlafen ist gut. Nur nicht zu lange (sati verfällt dann vollkommen und man hat alles vergessen und wundert sich nur warum)  :)

theoretisch hat ati.eu schon etwa 500.000 - 1Mio Seiten in den nächsten Monaten... und spider-, suchmaschinen off, damit etwas Wald und Wildnis überbleibt.



kamma-vipaka? das paßt gut im Anschluß:

"einfach nur Gänsehaut"



kataññū + saṁvega + pāsāda = Ver-rückt
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 19, 2019, 06:31:42 PM
Atma could now find a way, having now the possibility to use a laptop, to rename the files. By Commander with the "ren {file} {file new} & ren ...

Here the list of the renaming: renaming_files.

Only the main directory index, is up to date. The sub-directories have to be rebuild.

Atma thought it is good to make a flat structure for the source-files, only divided in Mula, Atthak., Tika and Anya and the other structure similar like for Khmer Tipitaka started, with the "include" tags.

If a trick of how to bring the file-name into the text of each page is known, that would make the modification into the ati-standard easier and faster, with 2698 files per script (4 at this time).
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 20, 2019, 03:25:08 PM
_/\_

Quote
Atma thought it is good to make a flat structure for the source-files, only divided in Mula, Atthak., Tika and Anya and the other structure similar like for Khmer Tipitaka started, with the "include" tags.

If a trick of how to bring the file-name into the text of each page is known, that would make the modification into the ati-standard easier and faster, with 2698 files per script (4 at this time).

Not sure exactly what is required. Bring which file name into which text?

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 20, 2019, 04:11:16 PM
The name of each file in it's content as part of the text, Nyom. File XY may get it filename as text content at the fist line.

With notepat++"s replacements this can then be used to render certain links and anchors now not existing in the files.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 20, 2019, 05:20:09 PM
Bisher getane Replacements, Schritt fuer Schritt (1 Kommando dauert etwa eine Stunde), spaeter dann in eine ati-Seite fuer folgende Schriften und gleichen Standard:

Code: [Select]
<p rend=[^\w]centre[^\w]>(.*?)<\/p>	<div centeralign>$1</div>

<p rend=[^\w]bodytext[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi><hi rend=[^\w]dot[^\w]>\.<\/hi>([^\n]*?)<\/p>[\s]* <span para #para_$1>[$2]</span>$3\n\n

<pb ed=[^\w]([^<>]*?)[^\w] n=[^\w]([^<>]*?)[^\w] \/> <span anchor #$1_$2></span>
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 21, 2019, 03:55:02 PM
The name of each file in it's content as part of the text, Nyom. File XY may get it filename as text content at the fist line.

With notepat++"s replacements this can then be used to render certain links and anchors now not existing in the files.
I see. With a complete list of the files I could write a php script to let it be done on the server directly.

So, as I understand, this (http://accesstoinsight.eu/cs-rm//renaming_files) is the complete list of files for which this should happen?

Or do still some need to be renamed? This could also be done with a script on the server. I could write it on the weekend.

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 21, 2019, 06:50:44 PM
Sadhu!

Atma will finish the replacements as far as possible and then upload the files. The file-list should be fine. The files are renamed and syntax replacements are 90% done aside of the headers.

Similar to the use for the Khmer-Tipitaka, each header should get it's anchor and back-link and forward-link (to the released file in the "public area" by "include-plugins" use). It will not always match right but often in this way:

File-name:{pitaka}.{nikaya}.{book}.({sometimes chapter, like sn,an}.){subhead serial no.}

anchor for sutta: sang_id #sut.kn.iti.001
back path to the source file: (cs-rm|cs-km|cs-th|cs-ru):tipitaka:{file name}#sut.kn.iti.001
path to the released single sutta/vagga: (cs-rm|cs-km|cs-th|cs-ru):tipitaka:sut:kn:iti:sut.kn.iti.001|sut.kn.iti.001

======= pitaka =======
<span sang_id #sut>[[km:tipitaka:sut:index|sut]] | [[km:tipitaka:book_053#sut|book_053]]</span>

======= nikaya =======
<span sang_id #sut.kn>[[km:tipitaka:sut:kn:index|sut.kn]] | [[km:tipitaka:book_053#sut.kn|book_053]]</span>

======= book =======
<span sang_id #sut.kn.iti>[[km:tipitaka:sut:kn:iti:sut.kn.iti|sut.kn.iti]] | [[km:tipitaka:book_053#sut.kn.iti|book_053]]</span>

====== chapter ======
<span sang_id #sut.kn.iti.v1>[[km:tipitaka:sut:kn:iti:sut.kn.iti.v1|sut.kn.iti.v1]] | [[km:tipitaka:book_053#sut.kn.iti.v1|book_053]]</span>

===== title =====
<span sang_id #sut.kn.iti.v1.1>[[km:tipitaka:sut:kn:iti:sut.kn.iti#sut.kn.iti.v1.1|sut.kn.iti.v1.1]] | [[km:tipitaka:book_053#sut.kn.iti.v1.1|book_053]]</span>

==== subhead ====
<span sang_id #sut.kn.iti.001>[[km:tipitaka:sut:kn:iti:sut.kn.iti.001|sut.kn.iti.001]] | [[km:tipitaka:book_053#sut.kn.iti.001|book_053]]</span>

...possible to complicated and a lot of exceptions, since all different structured.

Maybe easier to put the file name under the header and a serial number for the similar headers. Preparing it like this:

====== chapter ======
<span sang_id #{file(-_.-_.)}>

===== title =====
<span sang_id #{file(-_.)}.v{no 1}>

==== subhead ====
<span sang_id #{file}.{no 1,2...}>


Once the last is present, it should be no problem to make the rest with normal regex.

Atma will look that he can upload them today or tomorrow, depending on lasting battery (and hopefully havn't used all web-space by other revisions till then) and leave the replacement anchors {file} and {no} in not not clear order in those files.

May Nyom not invest to much time in to complicated solutions, as told, to many inconsistencies to match it without rendering it another time anyway.

Putting the file name into files and making an increasing replacement, this two things Atma misses tools (or skill).

(Atma is just trying to keep a "guide to do for", doku, for additional languages/scripts like Burmes, Sri Lankan...)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 21, 2019, 08:29:51 PM
This are the replacements done incl. the placeholders for the two kind of paths {path-source} {path-release}, filename-part {file-}, filename {file} and number {no}.

"Chapter" should be always corresponding to the file-name. (but just 10500+ replacements, 2698x4 => 300?)

{path-source}: {lang}:{section}:{file}
{lang}: cs-rm or cs-km or cs-th or cs-ru
{section}: tipitaka or anya or atthakatha or tika; according to the file-name-end *_any, *_att, *_tik, or * (no "_" for "tipitaka")

{path-release}: {lang}:{section}:{pitaka}:{nikaya}:({sub-nikaya}:){chapter}:({title}:){no}:

{file}: {pitaka}.{nikaya}.({subnikaya}.){book}.({title}.){chapter}.{no}_{section}

{file-}: id/file-name-release reduced by one namespace

{file--}: id/file-name-release reduced by two namespaces

{file+}: id/file-name-release increased by one namespaces

Note that "subhead" has sometimes {file} (dn), {file-} (an,sn,kn sometimes) or {file--}
 (mn). Att, Any, Tik may be even more various.


Code: [Select]
<p rend=[^\w]centre[^\w]>(.*?)<\/p>	<div centeralign>$1</div>

<p rend=[^\w]bodytext[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi><hi rend=[^\w]dot[^\w]>\.<\/hi>([^\n]*?)<\/p>[\s]* <span para #para_$1>[$2]</span>$3\n\n

<p rend=[^\w]bodytext[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)[. ]*?<\/hi>[. ]*?([^\n]*?)<\/p>[\s]* <span para #para_$1>[$2]</span> $3\n\n

<pb ed=[^\w]([^<>]*?)[^\w] n=[^\w]([^<>]*?)[^\w][\s]*\/> <span anchor #$1_$2></span>

<p rend=[^\w]bodytext[^\w]>([^\n]+?)<\/p>[\s]* $1\n\n

<note>([^<>]+?)<\/note> <span note>$1</span>

<p rend=[^\w]gatha([^<>]*?)[^\w]>([^\n]+)<\/p>[\s]* <div gatha$1>$2</div>\n\n

<p rend=[^\w]hangnum[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[. ]*<hi rend=[^\w]dot[^\w]>[. ]*<\/hi>[. ]*([^\n]*)<\/p>[\s]* <div hangnum><span para #para_$1>[$2]</span></div> $3\n\n

<hi rend=[^\w]bold[^\w]>([^\n]+?)<\/hi> **$1**

<p rend=[^\w]nikaya[^\w]>([^<>]*?)<\/p> <div centeralign #nikaya>**$1**</div>\n<span sang_id #{file--}>[[{path-release}:{file--}|{file--}]] | [[{path-source}:{file}#{file--}|source]]</span>

<p rend=[^\w]book[^\w]>([^<>]*?)<\/p> ======== $1 ========\n<span sang_id #{file-}>[[{path-release}:{file-}|{file-}]] | [[{path-source}:{file}#{file-}|source]]</span>

<p rend=[^\w]chapter[^\w]>([^<>]*?)<\/p> ======= $1 =======\n<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}:{file}#{file}|source]]</span>

<p rend=[^\w]title[^\w]>([^<>]*?)<\/p> ===== $1 =====\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>

<p rend=[^\w]subhead[^\w]>([^<>]*?)<\/p> ==== $1 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>

Headers with note:

<p rend=[^\w]subhead[^\w]>([^<>]*?)<span note>([^<>]*?)<\/span>([^<>]*?)<\/p> ==== $1$3 ====\n<div centeralign>**$1<span note>$2</span>$3**</div>\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>

<p rend=[^\w]chapter[^\w]>([^<>]*?)<span note>([^<>]*?)<\/span>([^<>]*?)<\/p> ======= $1 =======\n<div centeralign>**$1<span note>$2</span>$3**</div>\n<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}:{file}#{file}|source]]</span>

<p rend=[^\w]title[^\w]>([^<>]*?)<span note>([^<>]*?)<\/span>([^<>]*?)<\/p> ===== $1 =====\n<div centeralign>**$1<span note>$2</span>$3**</div>\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>

Headers with anchors:

<p rend=[^\w]subhead[^\w]>([^<>]*?)<span anchor #(.+)<\/span>([^<>]*?)<\/p> ==== $1$3 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>\n<span span anchor #$2</span>

<p rend=[^\w]chapter[^\w]>([^<>]*?)<span anchor #(.+)<\/span>([^<>]*?)<\/p> ======= $1$3 =======\n<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}:{file}#{file}|source]]</span>\n<span span anchor #$2</span>

<p rend=[^\w]title[^\w]>([^<>]*?)<span anchor #(.+)<\/span>([^<>]*?)<\/p> ===== $1$3 =====\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>\n<span span anchor #$2</span>

<p rend=[^\w]book[^\w]>([^<>]*?)<span anchor #(.+)<\/span>([^<>]*?)<\/p> ======== $1$3 ========\n<span sang_id #{file-}>[[{path-release}:{file-}|{file-}]] | [[{path-source}:{file}#{file-}|source]]</span>\n<span span anchor #$2</span>


<p rend=[^\w]subsubhead[^\w]>([^<>]*?)<\/p> === $1 ===\n<span sang_id #{file-}.{no+}>[[{path-release}:{file-}.{no+}|{file-}.{no+}]] | [[{path-source}:{file}#{file-}.{no+}|source]]</span>

<p rend=[^\w](indent|unindented)[^\w]>([^\n]+)<\/p>[\s]* <div $1>$2</div>\n\n

Replacements for each language/script (cs-rm, cs-km, cs-th, cs-ru): header and footer

Code: [Select]
^(.*?)<body>(.*?)<\/body>(.*)$	{{section>en:tech:template_includes#cs-rm_header&nouser&nodate&noheader&noeditbutton&firstsectiononly}}\n<span hide>{file}</span>$2{{section>en:tech:template_includes#cs-rm_footer&nouser&nodate&noheader&noeditbutton&firstsectiononly}}

Script-specific replacements:

cs-km: replacement of ព្ព with ព្វ (according to Khmer spelling tradition in Pali)

(cs-rm: replacment of "m dot below" to "m dot above", still not finally decided)

See also ati-page edits
If not finding additional header-kinds and other tags to be replaces, the files will be uploaded, rendered with this replacements, starting tomorrow.

/me comment Friday: cloudy, maybe Saturday noon till upload possible.

7.3.2019: Some renderings added
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 22, 2019, 08:41:46 PM
Since having run out of sun-power: some cleanings of not matched case have been not finished today and after that it might be tomorrow afternoon for uploading the files. Just to keep informed.

/me informative update: still some not matched cases, cloudy and so possible no upload today.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 23, 2019, 07:00:38 PM
Many dot. issues, so still much work online then. How ever, headers should be for a great fine. Atma has uploaded the Khmer files into cs-km:tipitaka. The files of tika, anya, atthakata have not moved into there folders yet (for tika files Atma has started, but not sure if it will be finished).

Roman has been started likewise. No moving of att, tik, any for today started, but all files uploaded into tipitaka-folder in cs-rm.

Some 15% battery left, it might be finished.


Rest would come tomorrow.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 04:11:16 PM
File are all uploaded (certain release after 6 years carry), the Thai _any, _tik, _att moving to folders any, tik... is in progress (probably an hour or to, good battery left today).
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 06:50:00 PM
Uploads of the source-files and putting into their categories are done. Anumodana!

There are still some xml-tags left, most seemingly in the Cyrillic edition, caused by certain inconsistency (probably done by somebody else for this font/script).

At this place, does Upasaka Dmytro ( Admin ) know anything about the situation of the Cyrillic Edition? And has it been given to a possible existing Sangha using this script, aside of the surely huge amount of not so proper accesses? If not so, if Nyom likes to assist ad take part on the merits, he may invite personally to make use of it and give access and accounts. If he likes to scarifies effort and knowledge to render the storage and text proper, he may do so given, for the Sanghas and their devoted followers use.

Atma would continue to correct certain syntax with batch-edit, if this does not disturb any fort that Nyom Moritz possible likes to put into. Headers might be touched for some less cases where the "note" span is included (i.e. have no placeholders now).

Ohh... no search results => re-indexing!? That will be long. Any idea who best should try it?

Ohh... and the Thai files are Cyrillic... looks likes some detail review required. Started upload Thai anew.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 24, 2019, 07:20:07 PM
Vandami Bhante,

so, to see if I understand correctly again: Filename just need to be put in the beginning of the text,
so maybe just

Code: [Select]
{filename}

...everything else from the file

?

And filename should have the complete path, for example "cs-km/anya/vismag.nid.23_any.txt"?



Quote
Ohh... no search results => re-indexing!? That will be long. Any idea who best should try it?

I think it would be best if I do the re-indexing from here, having a stable internet connection.

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 07:32:40 PM
There are now placeholders, Nyom Moritz, like {file} {file-} {file--} {file+} on each page, not only in the header part (like colored mentioned/specified below). If replacing just all {file} with the file-name of the list, then it should be possible for Atma to render the other.

http://www.accesstoinsight.eu/cs-rm/tipitaka/sut.an01.v01 for example

search for {file} replace with sut.an01.v01 (file-renaming-list is actually no more needed)

Code: [Select]
<div #cs-rm>
{{section>en:tech:template_includes#cs-rm_header&nouser&nodate&noheader&noeditbutton&firstsectiononly}}
<span hide>{file}</span>

<div centeralign> Namo tassa bhagavato arahato sammāsambuddhassa</div>

<div centeralign #nikaya>**Aṅguttaranikāyo**</div>
<span sang_id #{file--}>[[{path-release}:{file--}|{file--}]] | [[{path-source}:{file}#{file--}|source]]</span>

======== Ekakanipātapāḷi ========
<span sang_id #{file-}>[[{path-release}:{file}:{file}|{file-}]] | [[{path-source}#{file-}|source]]</span>

======= 1. Rūpādivaggo =======
<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}#{file}|source]]</span>

<span para #para_1>[1]</span> Evaṃ <span anchor #T_1.0001></span><span anchor #V_1.0001></span><span anchor #P_1.0001></span><span anchor #M_1.0001></span> me sutaṃ – ekaṃ samayaṃ bhagavā sāvatthiyaṃ viharati jetavane anāthapiṇḍikassa ārāme. Tatra kho bhagavā bhikkhū āmantesi – ‘‘bhikkhavo’’ti. ‘‘Bhadante’’ti te bhikkhū bhagavato paccassosuṃ. Bhagavā etadavoca –

‘‘Nāhaṃ, bhikkhave, aññaṃ ekarūpampi samanupassāmi yaṃ evaṃ purisassa cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, itthirūpaṃ. Itthirūpaṃ, bhikkhave, purisassa cittaṃ pariyādāya tiṭṭhatī’’ti. Paṭhamaṃ.

<span para #para_2>[2]</span> ‘‘Nāhaṃ, bhikkhave, aññaṃ ekasaddampi samanupassāmi yaṃ evaṃ purisassa cittaṃ pariyādāya <span anchor #V_1.0002></span> tiṭṭhati yathayidaṃ, bhikkhave, itthisaddo. Itthisaddo, bhikkhave, purisassa cittaṃ pariyādāya tiṭṭhatī’’ti. Dutiyaṃ.

<span para #para_3>[3]</span> ‘‘Nāhaṃ, bhikkhave, aññaṃ ekagandhampi samanupassāmi yaṃ evaṃ purisassa cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, itthigandho. Itthigandho, bhikkhave, purisassa cittaṃ pariyādāya tiṭṭhatī’’ti. Tatiyaṃ.

<span para #para_4>[4]</span> ‘‘Nāhaṃ <span anchor #T_1.0002></span><span anchor #P_1.0002></span>, bhikkhave, aññaṃ ekarasampi samanupassāmi yaṃ evaṃ purisassa cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, itthiraso. Itthiraso, bhikkhave, purisassa cittaṃ pariyādāya tiṭṭhatī’’ti. Catutthaṃ.

<span para #para_5>[5]</span> ‘‘Nāhaṃ <span anchor #M_1.0002></span>, bhikkhave, aññaṃ ekaphoṭṭhabbampi samanupassāmi yaṃ evaṃ purisassa cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, itthiphoṭṭhabbo. Itthiphoṭṭhabbo, bhikkhave, purisassa cittaṃ pariyādāya tiṭṭhatī’’ti. Pañcamaṃ.

<span para #para_6>[6]</span> ‘‘Nāhaṃ, bhikkhave, aññaṃ ekarūpampi samanupassāmi yaṃ evaṃ itthiyā cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, purisarūpaṃ. Purisarūpaṃ, bhikkhave, itthiyā cittaṃ pariyādāya tiṭṭhatī’’ti. Chaṭṭhaṃ.

<span para #para_7>[7]</span> ‘‘Nāhaṃ, bhikkhave, aññaṃ ekasaddampi samanupassāmi yaṃ evaṃ itthiyā cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, purisasaddo. Purisasaddo, bhikkhave, itthiyā cittaṃ pariyādāya tiṭṭhatī’’ti. Sattamaṃ.

<span para #para_8>[8]</span> ‘‘Nāhaṃ, bhikkhave, aññaṃ ekagandhampi samanupassāmi yaṃ evaṃ itthiyā cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, purisagandho. Purisagandho, bhikkhave, itthiyā cittaṃ pariyādāya tiṭṭhatī’’ti. Aṭṭhamaṃ.

<span para #para_9>[9]</span> ‘‘Nāhaṃ, bhikkhave, aññaṃ ekarasampi samanupassāmi yaṃ evaṃ itthiyā cittaṃ pariyādāya <span anchor #V_1.0003></span> tiṭṭhati yathayidaṃ, bhikkhave, purisaraso. Purisaraso, bhikkhave, itthiyā cittaṃ pariyādāya tiṭṭhatī’’ti. Navamaṃ.

<span para #para_10>[10]</span> ‘‘Nāhaṃ, bhikkhave, aññaṃ ekaphoṭṭhabbampi samanupassāmi yaṃ evaṃ <span anchor #T_1.0003></span> itthiyā cittaṃ pariyādāya tiṭṭhati yathayidaṃ, bhikkhave, purisaphoṭṭhabbo. Purisaphoṭṭhabbo, bhikkhave, itthiyā cittaṃ pariyādāya tiṭṭhatī’’ti. Dasamaṃ.

<div centeralign>Rūpādivaggo paṭhamo.</div>
{{section>en:tech:template_includes#cs-rm_footer&nouser&nodate&noheader&noeditbutton&firstsectiononly}}
</div>

{file-} would be sut.an01

{file--} would be sut

{file+} would be sut.an01.v01.xx (not easy to automatically render)

{no} would be a serial 1, 2, 3... for each page

{path-source} would be the path of the current file, i.e. cs-rm:tipitaka

{path-release} would be cs-rm:tipitaka:sut:an (possible not always correct)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 24, 2019, 07:49:21 PM
Quote
search for {file} replace with sut.an01.v01 (file-renaming-list is actually no more needed)

Okay. So I just replace {file}, {file-} {file--} according to this pattern for now, for all files inside cs-km, cs-rm and cs-ru and then rebuild the index.

Thai then after re-upload when it is finished. Correct?

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 08:05:42 PM
Quote
search for {file} replace with sut.an01.v01 (file-renaming-list is actually no more needed)

Okay. So I just replace {file}, {file-} {file--} according to this pattern for now, for all files inside cs-km, cs-rm and cs-ru and then rebuild the index.

Thai then after re-upload when it is finished. Correct?

_/\_

Sadhu!

Upload nearly finished, moving att, any, tik-files from tipitaka-folder to their folders might need another 1-2h. Atma would give an "ready", then.
Title: Status of Cyrillic Chatta Sanghayana heritage
Post by: Dhammañāṇa on February 24, 2019, 08:16:16 PM

Aramika   *

Ein oder mehrer Beiträge wurden hier im Thema abgeschnitten und damit in neues Thema "Status of Cyrillic Chatta Sanghayana heritage (http://forum.sangham.net/index.php?topic=9147.0)" eröffnet, dem angehäng.
One or more posts have been cut out of this topic here. A new topic, based on it, has been created as "Status of Cyrillic Chatta Sanghayana heritage (http://forum.sangham.net/index.php?topic=9147.0)" or attached there.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 08:50:14 PM
Overwriting the files in Thai in their folders seems to need more time, possible 5-6sec each file. Tika is still in progress since before. If Nyom thinks it would be faster with his tools, Atma would stop it and leave the move, the overwritting of the rest of _tika (if still), _any, _att-files found in the folder cs-th left, to the folders cs-th:anya:, cs-th:tika:, cs-th:atthakatha:. But just asked so that it would not take possible valuable time of Nyom.

On the other hand, since just insert file-names, it would not matter if made later, the overwrite, or? Andro-ES-explorer just gave up to response.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 24, 2019, 09:11:52 PM
Overwriting the files in Thai in their folders seems to need more time, possible 5-6sec each file. Tika is still in progress since before. If Nyom thinks it would be faster with his tools, Atma would stop it and leave the move, the overwritting of the rest of _tika (if still), _any, _att-files found in the folder cs-th left, to the folders cs-th:anya:, cs-th:tika:, cs-th:atthakatha:. But just asked so that it would not take possible valuable time of Nyom.

Quote
On the other hand, since just insert file-names, it would not matter if made later, the overwrite, or?
It would not matter. I could do it later when all is moved to the right place.


But it seems I still don't understand everything completely. So some more clarifications:

For each of the directories cs-rm, cs-km, ... etc. there are these subdirectories:
anya
atthakatha
tika
tipitaka

Some files from tipitaka are now being moved into anya, attakatha, etc.
(Only remaining files should be the sut.[...] files?)

Maybe moving could go faster if I write a script for that, too. So, what exactly must be moved?

All files inside tipitaka ending with _any should be moved to anya, for example? And what else? If there is a simple pattern, I think moving everything with a PHP script would be easy and go much faster than via FTP.


And then... after that is finished, I should replace {file} with the filename (without .txt ending) and maybe some other replacements.

But the patterns described here (http://forum.sangham.net/index.php/topic,8672.msg17921.html#msg17921) do not fit for all files. For example, some files like vismag.nid19_any:
{file-} would be vismag
{file--} would be empty?

Maybe best to clarify on the phone. I will try to call. _/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 09:33:51 PM
Quote
Some files from tipitaka are now being moved into anya, attakatha, etc.

In cs-th, yes, overwritting the wrong cyrill files. Cs-rm, cs-ru, cs-km, should be fine already.

The upload tool on laptop does not allow search for _att for example, so my person uploaded all first to the folder tipitaka and moved the _any, _tik, _att files with an app on android, which allows search and handle this search, to the right folders. Those files who have no _any, _att, _tik at the and, remain as tipitaka files. And yes, for each language there are this four folders.

Quote
For each of the directories cs-rm, cs-km, ... etc. there are these subdirectories:
anya
atthakatha
tika
tipitaka
Yes.

Quote
(Only remaining files should be the sut.[...] files?)
No. They are just one part of the files (remaining) in the tipitaka folder, aside of vin and abh

Quote
All files inside tipitaka ending with _any should be moved to anya, for example? And what else? If there is a simple pattern, I think moving everything with a PHP script would be easy.

All files in cs-th/tipitaka/ (only cs-th namespace!), containing _any at the end, to cs-th/anya/
All files in cs-th/tipitaka/ (only cs-th), containing _tik at the end, to cs-th/tika/
All files in cs-th/tipitaka/ (only cs-th), containing _att at the end, to cs-th/atthakatha/

(over-writting the existent in folders (which are wrongly cyrillic))

Quote
And then... after that is finished, I should replace {file} with the filename (without .txt ending) and maybe some other replacements.
Yes, but no "should", could, if wishing.

Quote
But the patterns described here do not fit for all files. For example, some files like vismag.nid19_any:
{file-} would be vismag
{file--} would be empty?

Yes, Atma is aware of it, especially in anya, where there are flatter structure. But to handle the "lesser" inconsistent "manual" is thought to be easier then to think about all possibilities. Maybe a "if name is "0" then "(whole) filename" might be possible to cause no errors.

Quote
Maybe best to clarify on the phone. I will try to call.
How ever Nyom Moritz likes. Still some battery left (12% on LP, 40% on tablet).

Mudita









Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 24, 2019, 09:47:31 PM
Quote
Maybe best to clarify on the phone. I will try to call.
How ever Nyom Moritz likes. Still some battery left (12% on LP, 40% on tablet).

I tried to call, but was not answered. Then another phone call came in between. But now everything should be more clear already from the explanations I think.

Quote
All files inside tipitaka ending with _any should be moved to anya, for example? And what else? If there is a simple pattern, I think moving everything with a PHP script would be easy.

All files in cs-th/tipitaka/ (only cs-th namespace!), containing _any at the end, to cs-th/anya/
All files in cs-th/tipitaka/ (only cs-th), containing _tik at the end, to cs-th/tika/
All files in cs-th/tipitaka/ (only cs-th), containing _att at the end, to cs-th/atthakatha/

(over-writting the existent in folders (which are wrongly cyrillic))

Okay. I can do the remaining with a php script very quickly.

Quote
But the patterns described here do not fit for all files. For example, some files like vismag.nid19_any:
{file-} would be vismag
{file--} would be empty?

Yes, Atma is aware of it, especially in anya, where there are flatter structure. But to handle the "lesser" inconsistent "manual" is thought to be easier then to think about all possibilities. Maybe a "if name is "0" then "(whole) filename" might be possible to cause no errors.

Maybe just leave {file-} and {file--} as is (unreplaced) for now, in cases where it would be empty? Then one could still think about what to do with it.

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 09:57:20 PM
As Nyom thinks that it is good, both ways, what ever made with joy and good nimitta, is/are fine. Sadhu.

(ES-Explorer seems to be not able to handle the mass on overwrittings anyway... stopped again and Atm will stop this try for now, leave it to Nyom how to prefer. "Hopefully" not to much files lost, but single check needs to be made anyway after the initial implementation. ES sometimes deletes files if transfer is stopped in between.)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 24, 2019, 10:18:59 PM
As Nyom thinks that it is good, both ways, what ever made with joy and good nimitta, is/are fine. Sadhu.

(ES-Explorer seems to be not able to handle the mass on overwrittings anyway... stopped again and Atm will stop this try for now, leave it to Nyom how to prefer. "Hopefully" not to much files lost, but single check needs to be made anyway after the initial implementation. ES sometimes deletes files if transfer is stopped in between.)

_/\_

Okay. Rename is done. Now trying to make the script for replacing the {file} etc.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 10:36:47 PM
As Nyom thinks that it is good, both ways, what ever made with joy and good nimitta, is/are fine. Sadhu.

(ES-Explorer seems to be not able to handle the mass on overwrittings anyway... stopped again and Atm will stop this try for now, leave it to Nyom how to prefer. "Hopefully" not to much files lost, but single check needs to be made anyway after the initial implementation. ES sometimes deletes files if transfer is stopped in between.)

_/\_

Okay. Rename is done. Now trying to make the script for replacing the {file} etc.
Sadhu, although "hopefully" over-writting by moving and not renaming. Btw. there are 5 file with an end _atta (not _att) left in cs-th/tipitaka/, not sure if possible a previous fault of mine yet, which seems reasonable.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 11:02:30 PM
Seems like Vinaya is gone in latin script... *ironic smile*, but that might have happened by the replacement of footer andheader, risky, of course, assuming file would not have <body>-errors...
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 24, 2019, 11:15:35 PM
Sadhu, although "hopefully" over-writting by moving and not renaming.
Renaming should mean the same as moving.

Btw. there are 5 file with an end _atta (not _att) left in cs-th/tipitaka/, not sure if possible a previous fault of mine yet, which seems reasonable.
Okay, also moved "tipitaka/sut.kn.iti.3_atta.txt" to "atthakatha/sut.kn.iti.3_att.txt" etc.

Now testing the replace script locally before using it on the server.

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 24, 2019, 11:19:42 PM
Seems like Vinaya is gone in latin script... *ironic smile*, but that might have happened by the replacement of footer andheader, risky, of course, assuming file would not have <body>-errors...
So this happened by Bhante's replacements?
Maybe one could restore something from backup. At the moment, still the backup is not possible to download. A Greensta employee told me they see the problem and are working for a solution.

Ah... or of course there should still be the more recent versions in the attic.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 11:23:17 PM
Sadhu and "Good luck" and may Nyom take his time, not so following Atma in "pushing enter". Rejoicing having the Noble Sangha in mind.

Mudita

Seems like Vinaya is gone in latin script... *ironic smile*, but that might have happened by the replacement of footer andheader, risky, of course, assuming file would not have <body>-errors...
So this happened by Bhante's replacements?
Maybe one could restore something from backup. At the moment, still the backup is not possible to download. A Greensta employee told me they see the problem and are working for a solution.

No need to restore. Atma can reproduce certain opps... and Nyoms script for replacement can possible run a later time again, for matching the renewed.

Great to hear that Greensta saw this after 4 years telling often. Nyoms merits, their benefit here, blessed being able to receive it.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 24, 2019, 11:33:53 PM
About 40 files, in various areas, in cs-rm. Replacements with regex of html and xml are certain risky ways, as profis say. But no problem, no need to go there deeper, needing a lot of Nyoms time. Atma is used to make thing min. three times anew till fine, and let go.

Atma has learned to value the strict syntax without pardon of Dokuwiki, althought one might think inflexible at first glance. Enough now of anussavari, power/battery is off now.

Much mudita, Nyom.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 25, 2019, 01:26:15 AM
Okay, {file}, {file-} and {file--} should now be replaced (the latter two only if there are at least one, or two dots in the name).

Also {path-source} and {no}.

With {no} I have incremented the number every time there is an occurence that is not on the same line as the previous one.

So
Code: [Select]
{no} {no} {no}
{no}
{no} {no}
would beomce
Code: [Select]
1 1 1
2
3 3

I hope that the same value for {no} always appear on the same line so that the result would be correct.

Other things not replaced yet. In case of any errors, I have a backup of all files before the replacement.

_/\_


PS: Updating search index is running but will probably take at least one full day.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 25, 2019, 04:24:00 AM
Sadhu, Sadhu
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 25, 2019, 12:10:55 PM
Thee list of broken files, Atma remembers that forgot to use "lazy" search in footer and header-replacement, canceled but might have executed.

How ever, here the list of files (brought by dos-command "dir /b > _list.txt") for the selected 295-320 byte-files, in cs-rm, Atma will upload anew after preparing:
Code: [Select]
02/25/2019  11:32 AM               295 abh.ds.01.txt
02/25/2019  11:44 AM               301 abh.ds.0_att.txt
02/25/2019  11:44 AM               305 abh.ds.1.3_att.txt
02/25/2019  11:43 AM               315 abh.ds.2_matika_tik.txt
02/25/2019  11:43 AM               305 abh.kv_anu_tik.txt
02/25/2019  11:43 AM               307 abh.kv_mula_tik.txt
02/25/2019  11:32 AM               301 abh.pa.01.01.txt
02/25/2019  11:32 AM               301 abh.pa.01.06.txt
02/25/2019  11:32 AM               301 abh.pa.02.01.txt
02/25/2019  11:32 AM               301 abh.pa.02.02.txt
02/25/2019  11:32 AM               301 abh.pa.02.10.txt
02/25/2019  11:32 AM               301 abh.pa.02.12.txt
02/25/2019  11:32 AM               303 abh.pa.03.001.txt
02/25/2019  11:32 AM               301 abh.pa.04.01.txt
02/25/2019  11:32 AM               295 abh.vb.17.txt
02/25/2019  11:43 AM               315 abh.vi.2_matika_tik.txt
02/25/2019  11:32 AM               295 abh.ya.02.txt
02/25/2019  11:32 AM               295 abh.ya.03.txt
02/25/2019  11:32 AM               295 abh.ya.05.txt
02/25/2019  11:32 AM               295 abh.ya.06.txt
02/25/2019  11:32 AM               295 abh.ya.07.txt
02/25/2019  11:32 AM               295 abh.ya.09.txt
02/25/2019  11:32 AM               295 abh.ya.10.txt
02/25/2019  11:46 AM               313 bud-vgs.buddhg_any.txt
02/25/2019  11:46 AM               307 bud-vgs.nk2_any.txt
02/25/2019  11:46 AM               307 bya-gs.abh3_any.txt
02/25/2019  11:46 AM               307 bya-gs.abh4_any.txt
02/25/2019  11:46 AM               311 bya-gs.abhti2_any.txt
02/25/2019  11:46 AM               311 bya-gs.abhti3_any.txt
02/25/2019  11:46 AM               307 bya-gs.mog0_any.txt
02/25/2019  11:46 AM               307 bya-gs.pay2_any.txt
02/25/2019  11:46 AM               309 bya-gs.sad15_any.txt
02/25/2019  11:46 AM               311 bya-gs.subti1_any.txt
02/25/2019  11:46 AM               311 bya-gs.subti4_any.txt
02/25/2019  11:46 AM               309 ledi-sgs.nd2_any.txt
02/25/2019  11:46 AM               309 ledi-sgs.nd6_any.txt
02/25/2019  11:46 AM               309 ledi-sgs.nd7_any.txt
02/25/2019  11:46 AM               307 niti-gs.dha_any.txt
02/25/2019  11:46 AM               309 paki-gs.vess_any.txt
02/25/2019  11:46 AM               311 sang-pu-vi.an_any.txt
02/25/2019  11:46 AM               313 siha-gs.dathvo_any.txt
02/25/2019  11:46 AM               309 siha-gs.jica_any.txt
02/25/2019  11:46 AM               309 siha-gs.jiva_any.txt
02/25/2019  11:46 AM               311 siha-gs.mili2_any.txt
02/25/2019  11:46 AM               309 siha-gs.mogg_any.txt
02/25/2019  11:46 AM               311 siha-gs.padam_any.txt
02/25/2019  11:46 AM               311 siha-gs.padas_any.txt
02/25/2019  11:46 AM               309 siha-gs.sama_any.txt
02/25/2019  11:44 AM               309 sut.an01.v14_att.txt
02/25/2019  11:43 AM               311 sut.dn.01_abh_tik.txt
02/25/2019  11:44 AM               303 sut.dn.01_att.txt
02/25/2019  11:43 AM               303 sut.dn.01_tik.txt
02/25/2019  11:43 AM               311 sut.dn.02_abh_tik.txt
02/25/2019  11:43 AM               309 sut.dn.0_abh_tik.txt
02/25/2019  11:44 AM               303 sut.dn.16_att.txt
02/25/2019  11:43 AM               303 sut.dn.33_tik.txt
02/25/2019  11:45 AM               311 sut.kn.apd.00_att.txt
02/25/2019  11:32 AM               303 sut.kn.apd.01.txt
02/25/2019  11:45 AM               311 sut.kn.apd.01_att.txt
02/25/2019  11:32 AM               303 sut.kn.apd.40.txt
02/25/2019  11:32 AM               303 sut.kn.apd.41.txt
02/25/2019  11:32 AM               303 sut.kn.apd.54.txt
02/25/2019  11:32 AM               303 sut.kn.apd.58.txt
02/25/2019  11:32 AM               303 sut.kn.apd.59.txt
02/25/2019  11:45 AM               311 sut.kn.buv.01_att.txt
02/25/2019  11:45 AM               311 sut.kn.buv.02_att.txt
02/25/2019  11:44 AM               309 sut.kn.cap.1_att.txt
02/25/2019  11:44 AM               309 sut.kn.cap.3_att.txt
02/25/2019  11:32 AM               301 sut.kn.cun.2.txt
02/25/2019  11:44 AM               309 sut.kn.cun.2_att.txt
02/25/2019  11:32 AM               301 sut.kn.cun.3.txt
02/25/2019  11:45 AM               311 sut.kn.dhp.01_att.txt
02/25/2019  11:45 AM               311 sut.kn.iti.3_atta.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v00_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v01_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v02_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v03_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v04_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v05_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v06_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v07_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v10_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v13_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v14_att.txt
02/25/2019  11:32 AM               305 sut.kn.jat.v15.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v15_att.txt
02/25/2019  11:32 AM               305 sut.kn.jat.v16.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v16_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v17_att.txt
02/25/2019  11:32 AM               305 sut.kn.jat.v21.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v21_att.txt
02/25/2019  11:32 AM               305 sut.kn.jat.v22.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v22_att.txt
02/25/2019  11:45 AM               313 sut.kn.jat.v23_att.txt
02/25/2019  11:32 AM               305 sut.kn.mil.2-3.txt
02/25/2019  11:32 AM               301 sut.kn.mil.4.txt
02/25/2019  11:32 AM               301 sut.kn.mil.5.txt
02/25/2019  11:32 AM               301 sut.kn.mil.6.txt
02/25/2019  11:32 AM               301 sut.kn.net.4.txt
02/25/2019  11:44 AM               309 sut.kn.net.4_att.txt
02/25/2019  11:32 AM               301 sut.kn.net.6.txt
02/25/2019  11:32 AM               305 sut.kn.pat.v01.txt
02/25/2019  11:32 AM               305 sut.kn.pat.v02.txt
02/25/2019  11:45 AM               317 sut.kn.pat.v1.01_att.txt
02/25/2019  11:44 AM               309 sut.kn.pev.2_att.txt
02/25/2019  11:44 AM               309 sut.kn.pev.4_att.txt
02/25/2019  11:45 AM               309 sut.kn.snp.1_att.txt
02/25/2019  11:44 AM               309 sut.kn.snp.2_att.txt
02/25/2019  11:32 AM               301 sut.kn.snp.3.txt
02/25/2019  11:45 AM               309 sut.kn.snp.3_att.txt
02/25/2019  11:45 AM               311 sut.kn.tha.01_att.txt
02/25/2019  11:45 AM               311 sut.kn.tha.02_att.txt
02/25/2019  11:45 AM               311 sut.kn.tha.16_att.txt
02/25/2019  11:45 AM               311 sut.kn.tha.17_att.txt
02/25/2019  11:43 AM               311 sut.kn.vibh04_tik.txt
02/25/2019  11:32 AM               303 sut.kn.viv.v1.txt
02/25/2019  11:45 AM               311 sut.kn.viv.v1_att.txt
02/25/2019  11:32 AM               303 sut.kn.viv.v2.txt
02/25/2019  11:45 AM               311 sut.kn.viv.v2_att.txt
02/25/2019  11:44 AM               305 sut.mn.v01_att.txt
02/25/2019  11:43 AM               305 sut.mn.v01_tik.txt
02/25/2019  11:44 AM               305 sut.mn.v03_att.txt
02/25/2019  11:32 AM               297 sut.mn.v10.txt
02/25/2019  11:44 AM               301 sut.sn01_att.txt
02/25/2019  11:43 AM               301 sut.sn01_tik.txt
02/25/2019  11:32 AM               293 sut.sn12.txt
02/25/2019  11:44 AM               301 sut.sn12_att.txt
02/25/2019  11:43 AM               301 sut.sn12_tik.txt
02/25/2019  11:32 AM               293 sut.sn22.txt
02/25/2019  11:32 AM               293 sut.sn35.txt
02/25/2019  11:44 AM               301 sut.sn35_att.txt
02/25/2019  11:43 AM               301 sut.sn35_tik.txt
02/25/2019  11:46 AM               311 vamsa-gs.sv06_any.txt
02/25/2019  11:43 AM               311 vin.cv.0_paci_tik.txt
02/25/2019  11:43 AM               317 vin.khud.01_khud_tik.txt
02/25/2019  11:43 AM               317 vin.khud.02_khud_tik.txt
02/25/2019  11:43 AM               317 vin.khud.03_khud_tik.txt
02/25/2019  11:32 AM               295 vin.mv.01.txt
02/25/2019  11:43 AM               313 vin.mv.01_sara_tik.txt
02/25/2019  11:43 AM               311 vin.mv.0_paci_tik.txt
02/25/2019  11:43 AM               307 vin.mv_vaji_tik.txt
02/25/2019  11:32 AM               297 vin.pac.pc.txt
02/25/2019  11:32 AM               299 vin.pac.pci.txt
02/25/2019  11:44 AM               305 vin.pac.pc_att.txt
02/25/2019  11:43 AM               315 vin.pac.pc_sara_tik.txt
02/25/2019  11:43 AM               309 vin.pac_vaji_tik.txt
02/25/2019  11:43 AM               313 vin.par.0_paci_tik.txt
02/25/2019  11:44 AM               305 vin.par.ga_att.txt
02/25/2019  11:32 AM               297 vin.par.ni.txt
02/25/2019  11:44 AM               305 vin.par.ni_att.txt
02/25/2019  11:32 AM               297 vin.par.pr.txt
02/25/2019  11:44 AM               305 vin.par.pr_att.txt
02/25/2019  11:43 AM               315 vin.par.pr_sara_tik.txt
02/25/2019  11:43 AM               315 vin.par.pr_vima_tik.txt
02/25/2019  11:32 AM               297 vin.par.sg.txt
02/25/2019  11:44 AM               305 vin.par.sg_att.txt
02/25/2019  11:43 AM               315 vin.par.ve_sara_tik.txt
02/25/2019  11:43 AM               309 vin.par_vaji_tik.txt
02/25/2019  11:32 AM               295 vin.pv.01.txt
02/25/2019  11:43 AM               311 vin.pv.0_paci_tik.txt
02/25/2019  11:43 AM               311 vin.sara_sara_tik.txt
02/25/2019  11:43 AM               317 vin.vila.34_vila_tik.txt
02/25/2019  11:43 AM               317 vin.vivi.03_vivi_tik.txt
02/25/2019  11:43 AM               317 vin.vivi.07_vivi_tik.txt
02/25/2019  11:43 AM               317 vin.vviu.02_vviu_tik.txt
02/25/2019  11:43 AM               317 vin.vviu.03_vviu_tik.txt
02/25/2019  11:43 AM               317 vin.vviu.05_vviu_tik.txt
02/25/2019  11:43 AM               317 vin.vviu.08_vviu_tik.txt
02/25/2019  11:46 AM               317 vismag.ma-tik.17_any.txt

Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 25, 2019, 04:17:33 PM
The 169 other Sublime files have been renewed and uploaded (yet no placeholder replacements).
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 25, 2019, 06:58:24 PM
Additional or corrective replacements to be made:

Thai seems to be the only version where no real tag and other rendering issues are found, i.e. was/is maintained, looked after. Other errors occure in all other three languages likewise.
Possible makes sense to look at thai-version in regard of not clear issues which changes have been made from the original.

<p rend hangum

Quote
<p rend="hangnum" n="312-323"><hi rend="paranum">312-323</hi></p>

<p rend="hangnum">ក<hi rend="dot">.</hi></p>

<p rend="hangnum" n="561"><hi rend="paranum">៥៦១</hi><hi rend="bold"></hi>.</p>
hi "bold" empty deleting first generally

<p rend="hangnum"><hi rend="dot">.</hi>៥</p>
Dot befor no.

<p rend="hangnum" n="198"><hi rend="paranum">១៩៨</hi></p>
hang or paranum?

Special cases, not to match because possible wrong tag:

<p rend="hangnum">វន្ទេ មុនិមន្តិមជាតិយុត្តំ។</p>
should be gatha.

<p rend="hangnum">  </p>
possible "lost no."

<p rend="hangnum" n="9"><hi rend="paranum">៩</hi> ៥៤<hi ...

<p rend="hangnum">  </p>

 [៣៧៦] ១. អវារិយជាតកវណ្ណនា
Atthakatha khmer jat, general missing header and lost or missing paragraphs.

different styling in tika khmer
<p rend="hangnum">(ក)</p>

lost from text
<p rend="hangnum">  </p>

 ១៥២<hi rend="dot">.</hi>  អស
Jataka sut. khmer

<div gatha3>ឯកោ អរញ្ញេ វិហរំ បមត្តោ,</div>

<p rend="hangnum">ន មច្ចុធេយ្យស្ស តរេយ្យ បារ’’ន្តិ។</p>
confused with gathers, manual cases.


Code: regex draft [Select]


<p rend gathalast

Quote
<p rend="gathalast"></p>

headers

Quote
Dot sometimes used in headers!

<p rend="chapter">២<hi rend="dot">.</hi> ភូកណ្ឌ</p>

<p rend="title">១<hi rend="dot">.</hi> ភូមិវគ្គវណ្ណនា</p>

<p rend="title">1<hi rend="dot">.</hi> Bhūmivaggavaṇṇanā<

span anchors in headers
<p rend="chapter">ទាឋាវំសោ <span anchor #V_0.0001></span>...

note in headers
<p rend="chapter">៥. សោណវគ្គោ <span note>មហាវគ្គ (អ...


Most of header and para/hangnum faults in cs-km, cs-rm, cs-ru jat, atta+tipi

A lot of "specials" and since lang specific possible not worthy/possible to develope a pattern to be used for all future languages.

Paras, hangn. missing often id, special in each lang to get it into latin numbers then.

Research to be continue... after indexing best.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 25, 2019, 11:54:55 PM
Indexing is finished now. _/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 26, 2019, 06:21:25 AM
Sadhu
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 26, 2019, 05:09:02 PM
Replacement for the header-hangnum correction in jat, atthakatha and tipitaka

Name of replacement: hangum into header (jat)

search:

/===== ([^<>]*?) =====\n<span sang_id #([^\n]*?)>\[\[\{path-release\}:([^\n]*?)\|([^\n]*?)\]\] \| \[\[([^\n]*?):([^\n]*?):([^\n]*?)#([^\n]*?)\|source\]\]<\/span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\n]+ ([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\. ([^\n]*?)\n/s

replace
===== $1 =====\n<span sang_id #$2>[[{path-release}:$3|$4]] | [[$5:$6:$7#$8|source]]</span>$9==== $10. $11 ====\n<span sang_id #$2.{no}>[[{path-release}:$2.{no}|$2.{no}]] | [[$5:$6:$7#$2.{no}|source]]</span>\n

fist turn: "86 matches on 23 pages" then 10+ times
It has to be repeated till no new match can be found any more. In case a new script has special chars for numbers, they have to be added in the search.

For higher headers only pages then, and other specials... let's see further.

Ohh... btw. did Nyom Moritz , and that is just a question not a demand anyhow, run the replacement for the newly uploaded previous "broken" 169 pages another time, just thought there might be are some incl./misssed in this match(es) actually.

(the code display seems to be bugy here, btw., as the mod might be not updated to php version ?, therefore as quote text)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 26, 2019, 07:25:57 PM
Replacement for the higher header-hangnum correction in jat, atthakatha and tipitaka

Name of replacement: hangum into header (jat) 2

search:

/======= ([^<>]*?) =======\n<span sang_id #([^\n]+?)\.v(..)>\[\[\{path-release\}:([^\n]+?)\.v(..)\|([^\n]+?)\.v(..)\]\] \| \[\[([^\n]*?):([^\n]*?):([^\n]+?)\.v(..)#([^\n]+?)\.v(..)\|source\]\]<\/span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\n]+ ([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\. ([^\n]*?)\n/s

replace:

======= $1 =======\n<span sang_id #$2.v$3>[[{path-release}:$4.v$5|$6.v$7]] | [[$8:$9:$10.v$11#$12.v$13|source]]</span>$14==== $15. $16 ====\n<span sang_id #$2.{no}>[[{path-release}:$2.{no}|$2.{no}]] | [[$8:$9:$10.v$11#$2.{no}|source]]</span>\n

fist turn: "38 matches on 38 pages" then 10+ times
It has to be repeated till no new match can be found any more. In case a new script has special chars for numbers, they have to be added in the search.

Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 27, 2019, 10:17:13 AM
with [..] likewise

hangum into header (jat) 3

search:

/===== ([^<>]*?) =====\n<span sang_id #([^\n]*?)>\[\[\{path-release\}:([^\n]*?)\|([^\n]*?)\]\] \| \[\[([^\n]*?):([^\n]*?):([^\n]*?)#([^\n]*?)\|source\]\]<\/span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\n]+ \[([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\] ([^\n]*?)\n/s

replace:
 
===== $1 =====\n<span sang_id #$2>[[{path-release}:$3|$4]] | [[$5:$6:$7#$8|source]]</span>$9==== [$10] $11 ====\n<span sang_id #$2.{no}>[[{path-release}:$2.{no}|$2.{no}]] | [[$5:$6:$7#$2.{no}|source]]</span>\n

hangum into header (jat) 4

search:

/======= ([^<>]*?) =======\n<span sang_id #([^\n]+?)\.v(..)>\[\[\{path-release\}:([^\n]+?)\.v(..)\|([^\n]+?)\.v(..)\]\] \| \[\[([^\n]*?):([^\n]*?):([^\n]+?)\.v(..)#([^\n]+?)\.v(..)\|source\]\]<\/span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\n]+ \[([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\] ([^\n]*?)\n/s

replace:

======= $1 =======\n<span sang_id #$2.v$3>[[{path-release}:$4.v$5|$6.v$7]] | [[$8:$9:$10.v$11#$12.v$13|source]]</span>$14==== [$15] $16 ====\n<span sang_id #$2.{no}>[[{path-release}:$2.{no}|$2.{no}]] | [[$8:$9:$10.v$11#$2.{no}|source]]</span>\n

and since once defined exception increase...

hangum into header (jat) 5

search:

/======= ([^<>]*?) =======\n<span sang_id #([^\n]+?)\.v(..)_([^\n]+?)>\[\[\{path-release\}:([^\n]+?)\.v(..)_([^\n]+?)\|([^\n]+?)\.v(..)_([^\n]+?)\]\] \| \[\[([^\n]*?):([^\n]*?):([^\n]+?)\.v(..)_([^\n]+?)#([^\n]+?)\.v(..)_([^\n]+?)\|source\]\]<\/span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\n]+ \[([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\] ([^\n]*?)\n/s

replace:

======= $1 =======\n<span sang_id #$2.v$3_$4>[[{path-release}:$5.v$6_$7|$8.v$9_$10]] | [[$11:$12:$13.v$14_$15#$16.v$17_$18|source]]</span>$19==== [$20] $21 ====\n<span sang_id #$2.{no}_$4>[[{path-release}:$2.{no}_$4|$2.{no}_$4]] | [[$11:$12:$13.v$14_$15#$2.{no}_$4|source]]</span>\n


placeholder with higher header

hangum into header (jat) 6

search:

/======= ([^<>]*?) =======\n<span sang_id #([^\n]*?)>\[\[\{path-release\}:([^\n]*?)\|([^\n]*?)\]\] \| \[\[([^\n]*?):([^\n]*?)#([^\n]*?)\|source\]\]<\/span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\n]+ ([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\. ([^\n]*?)\n/s

replace:

======= $1 =======\n<span sang_id #$2>[[{path-release}:$3|$4]] | [[$5:$6#$7|source]]</span>$8==== $9. $10 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>\n
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on February 27, 2019, 06:20:53 PM
Vandami Bhante _/\_

Ohh... btw. did Nyom Moritz , and that is just a question not a demand anyhow, run the replacement for the newly uploaded previous "broken" 169 pages another time, just thought there might be are some incl./misssed in this match(es) actually.

I just added the list of files (http://forum.sangham.net/index.php/topic,8672.msg17960.html#msg17960) manually to the index now.

(the code display seems to be bugy here, btw., as the mod might be not updated to php version ?, therefore as quote text)
I can't recognize anything looking wrong on the picture.
What is it that seems buggy there?
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on February 27, 2019, 06:43:54 PM
Sadhu

Vandami Bhante _/\_

Ohh... btw. did Nyom Moritz , and that is just a question not a demand anyhow, run the replacement for the newly uploaded previous "broken" 169 pages another time, just thought there might be are some incl./misssed in this match(es) actually.

I just added the list of files (http://forum.sangham.net/index.php/topic,8672.msg17960.html#msg17960) manually to the index now.

Oh, it was more about the placeholder-replacment that Nyom did with a script. Indexing seems fine


(the code display seems to be bugy here, btw., as the mod might be not updated to php version ?, therefore as quote text)
I can't recognize anything looking wrong on the picture.
What is it that seems buggy there?

When having put this replace code under "code" it appeared different, repeated if looking at the screenshot.

Quote from:
===== $1 =====\n<span sang_id #$2>[[{path-release}:$3|$4]] | [[$5:$6:$7#$8|source]]</span>$9==== $10. $11 ====\n<span sang_id #$2.{no}>[[{path-release}:$2.{no}|$2.{no}]] | [[$5:$6:$7#$2.{no}|source]]</span>\n
Title: [ATI.eu] Indexing and search engine issues
Post by: Dhammañāṇa on March 06, 2019, 05:30:51 PM

Aramika   *

Ein oder mehrer Beiträge wurden hier im Thema abgeschnitten und damit in neues Thema "[ATI.eu] Indexing and search engine issues (http://forum.sangham.net/index.php/topic,9172.msg17988.html#msg17988)" eröffnet, dem angehäng.
One or more posts have been cut out of this topic here. A new topic, based on it, has been created as "[ATI.eu] Indexing and search engine issues (http://forum.sangham.net/index.php/topic,9172.msg17988.html#msg17988)" or attached there.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 06, 2019, 07:18:52 PM
hangnum corr
hangnumber and dot combinations

search:

/<p rend=[^\w]hangnum[^\w]>([^<>]+?)<hi rend=[^\w]dot[^\w]>.<\/hi><\/p>/

replace:

<div hangnum><span para #$1>[$1]</span></div>


Only in one file in the four sections:
search:

/<p rend=[^\w]hangnum[^\w]><hi rend=[^\w]dot[^\w]>.<\/hi>([^<>]+?)<\/p>/

replace:

<div hangnum><span para #5>[$1]</span></div>

"<p rend="hangnum">(ка)</p>" cases

search:

/<p rend=[^\w]hangnum[^\w]>\(([^><]+?)\)<\/p>/

replace:

<div hangnum>($1)</div>

hangnum+paranum cases

search:

<p rend=[^\w]hangnum[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[\. ,]*([^\n]*)<\/p>[\s]*

replace:

<div hangnum><span para #para_$1>[$2]</span></div> $3\n\n

(not wished exceptions here include: ...<hi rend="bold"></hi>.</p> ->  ...<hi rend="bold"></hi>.)

dot bold corr (exception !! not in :tipitaka:sut.kn.thi.05 , here it is a dot at the end of a sentence)

search:

<hi rend=[^\w]bold[^\w]><\/hi>\.|<hi rend=[^\w]bold[^\w]>\.<\/hi>

replace with nothing and for :tipitaka:sut.kn.thi.05 with .

Special cases of "lost" hangnum

Quote from: http://accesstoinsight.eu/cs-km/anya/bud-vgs.nk2_any
<div gatha3>បញ្ញាធិតិសីលគុណោឃវិន្ទំ,</div>

<p rend="hangnum">វន្ទេ មុនិមន្តិមជាតិយុត្តំ។</p>

<p rend="gathalast"></p>

search:

<div gatha([0-9]+)>(.+?)<\/div>\n\n<p rend=[^\w]hangnum[^\w]>(.+?)<\/p>\n\n<p rend=[^\w]gathalast[^\w]>[\s]*<\/p>

replace:

<div gatha$1>$2</div>\n\n<div gathalast>$3</div>

strange + in tika:abh.pa.31_tik#10299 and :tipitaka:sut.dn.20#26891

Quote from: tika:abh.pa.31_tik#10299
<p rend="hangnum">д̇ам̣ гаммапассад̣̇ваарзхи, д̣̇увид̇хам̣ самбавад̇д̇ад̇и.+</p>

search:

\+<\/div>

replace:

</div>

strange () and lost hangum in gatha

Quote from: http://accesstoinsight.eu/cs-ru/anya/siha-gs.jiva_any and other cases
<div gatha3>г̇анд̇аб̣б̣ам̣ во суг̇ахид̇амид̣̇ам̣ бхаасид̇ам̣ бхигкунод̇и</div>

<p rend="hangnum"> </p>

<div gathalast> чад̣д̣зд̇аб̣б̣ам̣ гаважанамид̇арам̣ д̣̇уг̇г̇ахийд̇анд̇и но жз;()</div>

search:

<div gatha([0-9]+)>(.+?)<\/div>\n\n<p rend=[^\w]hangnum[^\w]>[\s]+?<\/p>\n\n<div gathalast>(.*?)<\/div>

replace:

<div gatha$1>$2</div>\n\n<div gathalast>$3</div>


strange () :anya:siha-gs.jiva_any – 1798 matches x 4 ! (good if looking whether source had been touched)
search:

\(\)<\/div>

replace:

</div>

last 19 matches on 12 pages of "hangnum" are corrected manual matching single similar case with exceptions. Good done hopefully and to be listed good for further language implementations.

Exceptions (requiring further work later):

.:anya:niti-gs.kav05_any missing paragraph after 225

.:tika:vin.vviu.05_vviu_tik misses hangnum for paragraphs

.:tipitaka:sut.kn.jat.v22 some hangnums are headers

Pages which have lost content "strange () issue":
siha-gs.jiva_any (e1208n.nrf.xml)

Suspected files (also indexing errors):
abh.pa.70_tik ?
cs-th:atthakatha:sut.kn.pat.v1.01_att
cs-th:tika:sut.mn.v01_tik
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 06, 2019, 07:58:44 PM
repeating of previous batch-edits:

- hangum into header (jat) 3
...to be continued
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 07, 2019, 08:29:04 PM
<hi rend="dot">.</hi>

search:

<hi rend=[^\w]dot[^\w]>\.<\/hi>

replace:

.

not wished side effects:

Quote
<span para #para_1>[១]</span>

. ‘‘រាជកុ

<span para #para_2>[២]</span> .. វីថិមុត្តានំ បន កម្មកម្មនិ

<div gatha1" n="1><hi rend="paranum">១</hi>. រម្មេ

<span para #para_199>[១៩៩]</span> . ឥធ

dot errors

search in namespaces cs-rm, -th, -km, -ru only!:

\]<\/span>([\n\r ]*)\.[\s]+

replace:

]</span>$1

<div gatha1" n="7><hi rend="paranum">៧</hi>.

search:

<div gatha([0-9]+)["] n=["]([0-9]+)><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>\.

replace:

<div hangnum><span para #para_$2>[$3]</span></div>\n\n<div gatha$1>

intent hangum comb.

search:

<p rend=["]indent["] n=["]([0-9]+)["]><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>\.(.*?)<\/p>

replace:

<div hangnum><span para #para_$1>[$2]</span></div><div indent>$3</div>

special case in .:anya:siha-gs.jica_any manually.

bold jump out

search:

<hi rend=["]bold["]>([^\n<>]+?)\*\*

replace:

**$1**

search:

\]<\/span> \.<hi rend="bold"><\/hi>

replace:

]</span>

search:

<hi rend=["]bold["]>[\s]*<\/hi>

replace: with nothing

search:

\*\*<\/hi>

replace:  with nothing

**$1**

empty gathas

search:

<p rend=["]gatha[^"]+["]><|/p>\n\n

replace: with nothing

And "finally" a lost <note> was noted... having an anchor inside.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 07, 2019, 10:38:03 PM
looks like all xml-tag replacements are now done.

Further "To do"'s:

Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 08, 2019, 04:43:38 PM
"Beautifications" such as not more then 2 line-breaks, double white spaces... and other things possible disturbing a standard.

more than two line-breaks, done (for each cs-.. lang-name-space)

search:

[\n][\n]+

replace:

\n\n

white spaces done (for each cs-.. lang-name-space):

search:

[ ][ ]+(?!\*)

replace: (one white space)



Removing the Changing id to class for over-all lang-div since it breaks the section edit.

search:

<div #(cs-km|cs-rm|cs-th|cs-ru)>

replace:

<div $1>

Just seeing many files in Anya which have lost content, it's possible good to check that first now...
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 08, 2019, 06:15:32 PM
Possible good to make all replacements anew local and then upload, overwrite them, again on the server. To much different standards while processing and even if just some file wrong rendered more difficult to find them without overseeing one. At least saving server resource.

Regex with notepad++ (local) and batchedit (on the server) has also slight different regex-syntax which may also had it's (even huge) impact forgetting here and there considering it.

"Knowing", possible remembering now some of the many exceptions, possible to continue without long break..

Very common traditional monks encourage there disciple to learn to cite the text in this way: "You say you have a book, hmm? When you have the texts in the books, then they are still just in the books..."  :)

Yet one should not forget that remembering is not for sure and gone by breaking of ones body as well, or even before by sickness or accident. So what behind of practicing, to see that reality of anicca clear, could help at least?



or in the Buddhas words more matching, holding on a good Nimitta (object of "Hobby"), not having something as object that is for trade and gain thought by "a teacher":

Namo tassa bhagavato arahato sammā-sambuddhassa

The non-doing of any evil,
the performance of what's skillful,
the cleansing of one's own mind:
    this is the teaching
    of the Awakened.
Quote from: http://zugangzureinsicht.org/html/tipitaka/kn/dhp/dhp.14.than_en.html#dhp-183

...and having just learned about commands in powershell, such simple ways of creating a list in excel and them command strings like ((Get-Content filename.txt -Raw) -replace '{file}','filename.txt') | Set-Content  filename.txt might made the file and path and other {...} strings possible to do for Atma. Even not needing hours for one replacement in 12000 files with notepad. Oh wait! Create or giving possibilities and hints to make merits? What does one like? Rebirth as a "compassionate intelligent wiki bot"? Late already.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Moritz on March 09, 2019, 01:47:15 PM
Vandami Bhante,

Possible good to make all replacements anew local and then upload, overwrite them, again on the server. To much different standards while processing and even if just some file wrong rendered more difficult to find them without overseeing one. At least saving server resource.

if knowing from which point to start and edit again, I could also do the same replacements and upload from here, maybe better internet connection.

But in the last week now before the journey, really quite busy.

I still have a backup of the files in cs-rm, cs-km, cs-th and cs-th directories just before the {file} etc. replacements on my computer.

Downloading backups from Greensta should also be working now. I think daily automatic backups are deleted after 4-5 days. So one always should have one for the last three or four days. But just looking now: Daily accesstoinsight.eu backups have the "name" "Error: accesstoinsight.eu". Not sure if they are usable.

The last manual backup for accesstoinsight.eu, which seems to be okay and can still be downloaded, containing all the huge old attic archive, was from 17th of February.

I could bring it and what I have on my computer on a USB flash drive on the journey. I could also bring a laptop with me if useful.

_/\_
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 09, 2019, 02:50:23 PM
Sadhu and

Noo...  :) Nyom makes his needed things and brings no burden of duties and a released kusala mind with him, no IT needed.

Original files are on the Sangha laptop here as well as uploads by Nyom here on Sangham.net.

Having processed again late night, Atma came to see that Notepad seems to work different in regard of regex and it has eaten away content again being also very in-transparent in regard of multi file replacements and slow.

So thinking that the server and things there working actually better, Atma thinks to upload the whole originals right and "lastly" renamed and makes the work online again. Actually not that much and possible to do in one day if doing smart.

Command or powershell and other tool might be great, but Atma is not willing to learn other and more then required IT stuff on sustaining on alms, just using what is left and known from past.

For a good an final replacement of the placeholder like {file}... Atma has started to make a list that provides with replacements for each file then. That is something Atma is not able then to do online.

So here Atma tries to prepare offline and then inform of actions so that another me could always help here and there if rejoicing by such.

Quote from: Upasaka Moritz
backups

..oh yes. Only to relay on good actions might be risky if maintaining a "monastery", of cause. What ever does not become a burden and hurtful for one self or others.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 11, 2019, 01:23:36 PM
Having already found one of an potential content eater, having forgotten to escape dot

<p rend=[^\w]hangnum[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[. ]*<hi rend=[^\w]dot[^\w]>[. ]*<\/hi>[. ]*([^\n]*)<\/p>[\s]*   <div hangnum><span para #para_$1>[$2]</span></div> $3\n\n
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 12, 2019, 07:23:18 PM
Status (lokal)

Some files have been re-renamed. Current list: Renaming of source files (http://accesstoinsight.eu/cs-rm/renaming_files), renaming files.

Regex-list for xml- to ati-standard as done for "cs-rm", "cs-km", "cs-th", "cs-ru" at once.

Note that {...} strings will be replaced in a later selective session. Replacments are done "single-line" if not other mentioned.

##Starting with the header and footer, which replaces "content".

###HEADER multiline (10792 replacements)

Code: [Select]
[\s]*<\?xml(.+?)<body>[\s]*

<span hide>sources: cs-file name {cs file} path ati{lang}:{ns-section}:{file}</span>\n{{section>en:tech:template_includes#{lang}_header&nouser&nodate&noheader&noeditbutton&firstsectiononly}}\n<div {lang}>\n\n

###FOOTER multiline

Code: [Select]
[\s]*<\/body>(.*)<\/([^p]*?)>[\s]*

\n\n</div>\n{{section>en:tech:template_includes#{lang}_footer&nouser&nodate&noheader&noeditbutton&firstsectiononly}}

###CS-CD ANCHORS

Code: [Select]
<pb ed=[^\w]([^<>]*?)[^\w] n=[^\w]([^<>]*?)[^\w][\s]*\/>

<span anchor #$1_$2></span>

###BOLD

Code: [Select]
<hi rend=[^\w]bold[^\w]>([^\n]+?)<\/hi>

**$1**

###P CENTRE

Code: [Select]
<p rend=[^\w]centre[^\w]>(.*?)<\/p>

<div centeralign>$1</div>

###NOTE

Code: [Select]
<note>([^\n]+?)<\/note>

<span note>$1</span>

###P HI PARANUM DOT

Code: [Select]
<p rend=[^\w]bodytext[^\w] n=[^\w]([^<>]*?)[^\w]>[\s]*<hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[\s]*<hi rend=[^\w]dot[^\w]>\.<\/hi>([^\n]*?)<\/p>[\s]*

<span para #para_$1>[$2]</span>$3\n\n

###P HI PARANUM DOT []

Code: [Select]
<p rend=[^\w]bodytext[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)[\. ]*?<\/hi>[\. ]*?([^\n]*?)<\/p>[\s]*

<span para #para_$1>[$2]</span> $3\n\n

###P

Code: [Select]
<p rend=[^\w]bodytext[^\w]>([^\n]+?)<\/p>[\s]*

$1\n\n

###P HI PARANUM DOT

Code: [Select]
<p rend=[^\w]hangnum[^\w] n=[^\w]([^<>]*?)[^\w]><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[\. ]*<hi rend=[^\w]dot[^\w]>[\. ]*<\/hi>[\. ]*([^\n]*)<\/p>[\s]*

<div hangnum><span para #para_$1>[$2]</span></div> $3\n\n

###GATHA

Code: [Select]
<p rend=[^\w]gatha([^<>]*?)[^\w]>([^\n]+)<\/p>[\s]*

<div gatha$1>$2</div>\n\n

###INDENT|UNINDENTED

Code: [Select]
<p rend=[^\w](indent|unindented)[^\w]>([^\n]+)<\/p>[\s]*

<div $1>$2</div>\n\n

###NIKAYA

Code: [Select]
<p rend=[^\w]nikaya[^\w]>([^<>]*?)<\/p>

<div centeralign #nikaya>**$1**</div>\n<span sang_id #{file--}>[[{path-release}:{file--}|{file--}]] | [[{path-source}:{file}#{file--}|source]]</span>

###BOOK 868

Code: [Select]
<p rend=[^\w]book[^\w]>([^<>]*?)<\/p>

======== $1 ========\n<span sang_id #{file-}>[[{path-release}:{file-}|{file-}]] | [[{path-source}:{file}#{file-}|source]]</span>

###CHAPTER

Code: [Select]
<p rend=[^\w]chapter[^\w]>([^<>]*?)<\/p>

======= $1 =======\n<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}:{file}#{file}|source]]</span>

###TITLE

Code: [Select]
<p rend=[^\w]title[^\w]>([^<>]*?)<\/p>

===== $1 =====\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>

###SUBHEAD

Code: [Select]
<p rend=[^\w]subhead[^\w]>([^<>]*?)<\/p>

==== $1 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>

###SUBSUBHEAD

Code: [Select]
<p rend=[^\w]subsubhead[^\w]>([^<>]*?)<\/p>

=== $1 ===\n<span sang_id #{file-}.{no+}>[[{path-release}:{file-}.{no+}|{file-}.{no+}]] | [[{path-source}:{file}#{file-}.{no+}|source]]</span>

###SUBHEAD NOTE

Code: [Select]
<p rend=[^\w]subhead[^\w]>([^<>]*?)<span note>([^<>]*?)<\/span>([^<>]*?)<\/p>

==== $1$3 ====\n<div centeralign>**$1<span note>$2</span>$3**</div>\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>

###CHAPTER NOTE

Code: [Select]
<p rend=[^\w]chapter[^\w]>([^<>]*?)<span note>([^<>]*?)<\/span>([^<>]*?)<\/p>

======= $1$3 =======\n<div centeralign>**$1<span note>$2</span>$3**</div>\n<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}:{file}#{file}|source]]</span>

###TITLE NOTE

Code: [Select]
<p rend=[^\w]title[^\w]>([^<>]*?)<span note>([^<>]*?)<\/span>([^<>]*?)<\/p>

===== $1$3 =====\n<div centeralign>**$1<span note>$2</span>$3**</div>\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>

###SUBHEAD ANCHOR

Code: [Select]
<p rend=[^\w]subhead[^\w]>([^<>]*?)<span anchor #([^\n]*?)<\/span>([^<>]*?)<\/p>

==== $1$3 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>\n<span span anchor #$2</span>

###CHAPTER ANCHOR

Code: [Select]
<p rend=[^\w]chapter[^\w]>([^<>]*?)<span anchor #([^\n]*?)<\/span>([^<>]*?)<\/p>

======= $1$3 =======\n<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}:{file}#{file}|source]]</span>\n<span span anchor #$2</span>

###TITLE ANCHOR

Code: [Select]
<p rend=[^\w]title[^\w]>([^<>]*?)<span anchor #([^\n]*?)<\/span>([^<>]*?)<\/p>

===== $1$3 =====\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>\n<span span anchor #$2</span>

###BOOK ANCHOR

Code: [Select]
<p rend=[^\w]book[^\w]>([^<>]*?)<span anchor #([^\n]*?)<\/span>([^<>]*?)<\/p>

======== $1$3 ========\n<span sang_id #{file-}>[[{path-release}:{file-}|{file-}]] | [[{path-source}:{file}#{file-}|source]]</span>\n<span span anchor #$2</span>

Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 15, 2019, 04:39:40 PM
Further edits:

###DOT

Code: [Select]
<hi rend="dot">\.</hi>

.

###HANGUM INTO HEADER (JAT only) multiline

Code: [Select]
===== ([^<>]*?) =====\n<span sang_id #([^\n]*?)</span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\r\n]+ ([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\. ([^\n]*?)\n

===== $1 =====\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>$3==== $4. $5 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>\n

###HANGUM INTO HEADER (JAT only) multiline [..] X.

Code: [Select]
===== ([^<>]*?) =====\n<span sang_id #([^\n]*?)</span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\r\n]+ \[([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\] ([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\. ([^\n]*?)\n

===== $1 =====\n<span sang_id #{file+}>[[{path-release}:{file+}|{file+}]] | [[{path-source}:{file}#{file+}|source]]</span>$3==== [$4] $5. $6 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>\n

###HANGUM INTO HEADER HH (JAT only) multiline

Code: [Select]
======= ([^<>]*?) =======[\r\n]+<span sang_id #\{file\}>\[\[\{path-release\}:\{file\}\|\{file\}\]\] \| \[\[\{path-source\}:\{file\}#\{file\}\|source\]\]<\/span>(.*?)<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\r\n]+ ([១២៣៤៥៦៧៨៩០1234567890๑๒๓๔๕๖๗๘๙๐\-]+)\. ([^\n]*?)[\r\n]+

======= $1 =======\n<span sang_id #{file}>[[{path-release}:{file}|{file}]] | [[{path-source}:{file}#{file}|source]]</span>$2==== $3. $4 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>\n\n

###HANGUM INTO HEADER HH (JAT only) no NO.

Code: [Select]
<p rend=[^\w]hangnum[^\w]>[\s]*<\/p>[\r\n]+ ([^\n]*?)[\r\n]+

==== $1 ====\n<span sang_id #{file-}.{no}>[[{path-release}:{file-}.{no}|{file-}.{no}]] | [[{path-source}:{file}#{file-}.{no}|source]]</span>\n\n

###HANGNUM CORR (exception in bud-vgs.nk.2_any.txt and sut.sn.01.txt!!)

Code: [Select]
<p rend=[^\w]hangnum[^\w]>([^<>]+?)\.<\/p>

<div hangnum>$1.</div>

###Search "<p rend=[^\w]hangnum[^\w]>" further 47 hits in 39 files: best made one by one since many exceptions.

###BOLD corrections

without regex:

Code: [Select]
]</span> .

]</span>

correction before, ###HANGNUM, again

###P HANGNUM HI PARANUM BOLD

Code: [Select]
<p rend=[^\w]hangnum[^\w] n=[^\w]([^<>]*?)[^\w]>[\s]*<hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[\s]*<hi rend=[^\w]bold[^\w]>\.<\/hi>([^\n]*?)<\/p>[\s]*

<span para #para_$1>[$2]</span>$3\n\n

###further <hi rend="bold"> corr. are made on the single pages

###P HANGNUM HI PARANUM

Code: [Select]
<p rend=[^\w]hangnum[^\w] n=[^\w]([^<>]*?)[^\w]>[\s]*<hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[\s]*<\/p>[\s]*

<span para #para_$1>[$2]</span>\n\n

###P INTENT PARANUM

Code: [Select]
<p rend=[^\w]indent[^\w] n=[^\w]([^<>]*?)[^\w]>[\s]*<hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>\. ([^\n]*?)<\/p>[\s]*

<span para #para_$1>[$2]</span> $3\n\n

###P HANGNUM HI PARANUM content

Code: [Select]
<p rend=[^\w]hangnum[^\w] n=[^\w]([^<>]*?)[^\w]>[\s]*<hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[\. ]([^\n]*?)<\/p>[\s]*

<span para #para_$1>[$2]</span>$3\n\n

###GATHA PARANUM

Code: [Select]
<div gatha1[^\w] n=[^\w]([0-9]*?)><hi rend=[^\w]paranum[^\w]>([^<>]*?)<\/hi>[\. ]*([^\n]*?)</div>

<span para #para_$1>[$2]</span>\n\n<div gatha1>$3</div>

###Correction

Code: [Select]
<div gatha2" n="-><hi rend="paranum">-</hi>

<div gatha2>

###Manual corrections for all matches of "<p rend"

###Cleanings

Code: [Select]
\r\n

\n

There might be further xml-tags left and small edits needed, but those can be made online.

Atma will now replace the placeholder (except {no}, {no+}) where he has no idea of how to process that right and effective for now, and then upload all files anew.

(Note: working/processing on replacements with batchedit online is much faster as with notepad++ local (about a 3-4 days). Of course the cleaning of cache and delete of history online takes the also a good while.)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 15, 2019, 04:58:27 PM
And using Powershell such as ((Get-Content vin.par.ve.txt -Raw) -replace '{lang}','cs-km') | Set-Content vin.par.ve.txt destroyed the files, possible a utf-8-issue... (and having not made a backup...)

all once again  ^-^ :)
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 16, 2019, 07:02:54 AM
Atma will upload the renamed files with original content and try again to make the replacements online with batchedit, since having come across that Notepad sometimes loses found matches and gives nothing back when replacing.
In this way, at least, the originals would be stored on ati as well. Lets see whether web-space and sun allows it the next days.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 16, 2019, 10:25:30 PM
Files are all anew uploaded so far. The Khmer files need some rest replacements of xml codes. Renamed files have been deleted.

Once the index is rebuild, the last replacements can be made.

As for the replacements of the placeholder {file}, {ns-section}... it's maybe good if runing similar scripts on the server.

In regard of {no}: no over all idea for now, so maybe good as before.

Attached an excel-list containing all particular replacements for each single file.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 17, 2019, 05:41:59 PM
List of renaming of the index files (toc.xml): renaming_files#index-files_toc
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on March 23, 2019, 05:45:38 PM
Main indexes in the four scripts should be fine and complete now:

Tipiṭaka (Roman)
តិបិដក (បាឡិ​ខ្មែរ) ติปิฎก (Thai) д̇ибидага (кириллица)
My person currently ties to rebuild the index by actualization option, which actually seems to be double slower as to build anew, but possible would not aim in no index when stopping in between (about 3000 pages of 20500 indexed since this morning)
Title: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on April 01, 2019, 05:23:19 PM

Aramika   *

Ein oder mehrer Beiträge wurden hier im Thema abgeschnitten und damit in neues Thema "[ati.eu] Indexing, search engine (http://forum.sangham.net/index.php/topic,9172.msg18377.html#msg18377)" eröffnet, dem angehäng.
One or more posts have been cut out of this topic here. A new topic, based on it, has been created as "[ati.eu] Indexing, search engine (http://forum.sangham.net/index.php/topic,9172.msg18377.html#msg18377)" or attached there.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on April 02, 2019, 01:18:28 PM
{lang} and {ns-section} have now replaced on all pages except the 416 pages in cs-th (Thai, 268 in Atthakatha and 148 pages in Tika)

The further replacements ({file}, {path-source}...) could be made according the list above (http://forum.sangham.net/index.php/topic,8672.msg18280.html#msg18280) either page for page or with a script using the list. Files+/- etc, how ever, may need further renderings later. {no}... the same.

Sadhu for the great work and assitence of many to bring the first four languages into here and the availability for the Sangha and those with Nissaya.

Atma will look after the last xml converting into ati-syntax in the Khmer pages (https://forum.sangham.net/Themes/sangham/images/post/lamp.gif) and then look after the css for "good" layouts.

Quote from: http://accesstoinsight.eu/cs-rm/renaming_files
An Excel-file which is of help for creation of the release files, also in languages to come, can be used: renaming_list.xlsx To extract them into directories and files for an upload the Converting lists into txt-files - Tools for Ati.eu can be used.
Title: Re: [ATI.eu] CSCD xml to ati.eu format: converting, editing
Post by: Dhammañāṇa on April 11, 2019, 05:38:23 PM
Currently working on the "single-sutta release" files, which can require some time, given about 40.000 headers, but would then also give finally values for the {no..} replacements (for links to them) in the source-files.

Since making single files for Atthakatha and Tika would cause huge amount of files, if not skipping, and so Atma thought of implementing the related commentaries direct in the Sutta (Mula) files.