Warning: Illegal string offset 'html' in /home/wordmodules/cache/skin_cache/cacheid_1/skin_topic.php on line 975 Module Developers/Module Development - theWord Modules

Jump to content


- - - - -

Module Developers/Module Development


9 replies to this topic

#1 HowdeeDoodee

    New to the website

  • Members
  • Pip
  • 1 posts

Posted 05 April 2013 - 04:54 AM

Is there a piece or pieces of software for module development, like (I may be dreaming on this one) an editor like FrontPage where if you are brave you can develop the module by working in the code?

Does anyone know what programming language is used to develop the modules?

I have worked in Javascipt, VB, Basic, a lot of xl, html, and other computer languages. Are any of these languages related to the modules you develop?

Do you have a developer forum?

#2 Josh Bond

    Administrator

  • Administrators
  • PipPipPipPipPip
  • 320 posts
  • LocationGallatin, TN

Posted 05 April 2013 - 08:45 AM

The majority of people who make theWord modules copy and paste content into theWord. For a 15,000 comment commentary, that's a lot of copying and pasting! It's not practical and on a large module, it's error prone. I do not make modules this way. No one makes large modules this way—that's why few make large modules. Without programming, it becomes insane...

Costas, the developer of theWord, uses Delphi to make modules. He edits and make systematic changes to the RTF as an object model in Delphi. He then uses Delphi code (or some other script) to slice an RTF document into sqlite database format (the format used by theWord).

I make large modules by starting with a large RTF document (or a number of RTF documents). I use Microsoft Word's VBA environment to divide a commentary or dictionary with a system of ÷ signs. I use Visual Basic code and regular expressions in MS-Word's VBA environment. If the source document is in html format, I divide and tag the document with python. For example, I divided (tagged) the entire Utley NT commentary, verse by verse/passage by passage in about 20 seconds of actual run time (and maybe 15 minutes of script prep), using Python. (That would have taken weeks, or months with copying and pasting.)

I then use a tool for e-Sword called Tooltip NT, which processes the RTF document tagged above. It physically slices the RTF document into a sqlite database based on the position of the ÷ markers placed by VBA or python. Then I convert the database form e-Sword format to theWord format—that's mostly a schema change.

I use an e-Sword tool for theWord module creation because theWord has no such tool. theWord's import utility lets you create a directory structure on your hard drive, which it can import content from. That's basically the equivalent of TooltipNT's ÷ marker system but it's just too hard for me to work with.

#3 Josh Bond

    Administrator

  • Administrators
  • PipPipPipPipPip
  • 320 posts
  • LocationGallatin, TN

Posted 05 April 2013 - 08:54 AM

I'll share HOW I do it as well. Some text cannot be automatically sliced into a commentary, dictionary, devotion, book, etc. But many times it can be. I look for patterns either in the text itself or the underlying code (rtf or html). I'm looking for something that consistently signals a verse or passage change (if it's a commentary) or a devotion change (if it's a daily devotion).

For example, maybe the author of a commentary begins each line with a Verse ## in bold. That's good enough for a regular expression search/replace to insert a ÷ marker tag. Utley drew a large black box around a passage of quoted scripture. That consistently indicated he was changing passage discussions. That became the basis for dividing the commentary. Regular expression searches are the backbone for me.

#4 ErikJon

    theWord Supporter

  • Members
  • PipPipPip
  • 30 posts

Posted 28 April 2013 - 04:10 PM

Josh,

On that note, you have done some wonderful work with excellent formatting, all highly underrated.

At the moment I am trying to convert a series of 66 commentary documents in PDF format, to RTF format, using Calibre (I got the idea from one of your comments on the forum). After that, I just copy and paste each one into the appropriate spot on one of TheWord's commentary templates. If I can just get it all in at chapter level, I will be happy for now, as it still beats opening up the PDFs every time.

Anyway, I got Calibre to convert each one into a nice RTF document, which I then open with a little word processor. At that point I notice that Calibre throws a large SPACE+TAB indentation to the first line of each paragraph, which I remove simply with a find-and-replace feature available in the word processor (but I wish I knew how to do it in Calibre)

After that, one thing I cannot seem to figure out is how to remove all the "soft returns" in my converted RTF, without removing the separation between paragraphs.

In other words, the finished text is not "wrapped" to the window, but seems to have soft returns at the end of each line, and then an empty paragraph below each paragraph of text. I would like to retain the space between paragraphs while removing these soft returns. If I do a find-and-replace in the word processor, to remove returns, I lose the spacing between paragraphs as well.

Conclusion: can you tell me how to get Calibre to do two things: (1) remove the initial SPACE+TAB in the first line of each paragraph, and (2) remove all the soft returns without removing the spacing between paragraphs?
I'm an Independent Fundamental Baptist running TheWord portable v 5.001465 from a 32GB flash drive, with 1,000+ modules installed. I'm using 32-bit Vista Ultimate SP1 with a 2.7gHz processor and 4GB RAM. I also use the latest version of E-Sword only to access material that is either not yet available for TheWord, or else that is much less expensive in E-Sword format.

#5 darrel_jw

    Liking theWord

  • Members
  • PipPip
  • 27 posts

Posted 29 April 2013 - 12:53 AM

Help! So how do you get Calibre to convert a PDF to RTF format? I am trying to find a way to convert heavy Hebrew/Greek text from PDF to something that maintains the Greek and/or Hebrew characters (i.e., A Short Grammar of the Greek New Testament). Thanks for any help.

Darrel

#6 Josh Bond

    Administrator

  • Administrators
  • PipPipPipPipPip
  • 320 posts
  • LocationGallatin, TN

Posted 29 April 2013 - 02:58 PM

There's two types of PDF's. PDF's with scanned images, one for each page (like you see on Google Books). The other type of PDF is digitized text.

For digitized text, I recommend Adobe Acrobat, the full version, to extract the text from a PDF.

You can copy and paste text from a PDF--or use Calibre--but both of these methods treat each line as a separate line with a return at the end. Adobe Acrobat (not the reader, but the full version) does an excellent job of resassembling paragraphs and lines, and distinguishing between a line that should wrap and a carriage return. That's why I recommend Adobe Acrobat.

Unless Calibre somehow distinguishes a line that should wrap with a line that should have a return, there's no way to distinguish good returns or bad returns. I have seen text before that had soft returns at the end of each line but a real carriage return for the end of a paragraph. Then it was a matter of removing the soft returns and leaving the carriage returns. I've also dealt with text that has two returns for a real return and just one return for a line that should wrap. There, I deleted all instances of one return but not two. Unless there's a pattern, then you can't automate it.

Dealing with Greek and Hebrew is a different situation. If the Greek or Hebrew is digitized with a legacy font, instead of unicode, then that complicates how you restore the Greek or Hebrew font. It's not a point and click, simple solution. You will need an automated way (like a MS Word macro) to character match between the old font and unicode. I've done this when the legacy font was one of several old "standard" fonts used. When it wasn't, I had no way of dealing with it.

Josh

#7 ErikJon

    theWord Supporter

  • Members
  • PipPipPip
  • 30 posts

Posted 29 April 2013 - 09:38 PM

Thanks for the advice. I have Acrobat on a Macintosh computer, but I thought that Calibre had a trick for solving that problem on my PC.

Incidentally, have you got any tips for the scenario you mentioned of "removing two returns but not only one"? Maybe find and replace every one return with two, and then find-and-replace every three with none?
I'm an Independent Fundamental Baptist running TheWord portable v 5.001465 from a 32GB flash drive, with 1,000+ modules installed. I'm using 32-bit Vista Ultimate SP1 with a 2.7gHz processor and 4GB RAM. I also use the latest version of E-Sword only to access material that is either not yet available for TheWord, or else that is much less expensive in E-Sword format.

#8 Josh Bond

    Administrator

  • Administrators
  • PipPipPipPipPip
  • 320 posts
  • LocationGallatin, TN

Posted 30 April 2013 - 07:12 AM

I usually use MS-Word for that. Try it and see. Without seeing a sample of your text in a file, it's hard to say.

#9 Josh Bond

    Administrator

  • Administrators
  • PipPipPipPipPip
  • 320 posts
  • LocationGallatin, TN

Posted 30 April 2013 - 07:37 AM

In MS-Word,here's a few regular expressions that might clean up the text. You will have to adjust for any spaces:

Finds a line ending with a lower case letter and then next line beginning with a lower case letter with no intervening punctuation
search: ([a-z])^13([a-z])
replace: \1 \2

Same as above, except its more aggressive, finding a line beginning with an uppercase letter, to account for Jesus or God or Rome.
search: ([a-z])^13([A-z])
replace:\1 \2

You can do the same to account for semicolons and commas. Poetry might conflict with trying this on lines ending in a comma or semicolon.

#10 Bannytyncity

    New to the website

  • Members
  • Pip
  • 2 posts

Posted 18 September 2013 - 12:13 AM

I then use a tool for e-Sword called Tooltip NT.

Edited by Bannytyncity, 18 September 2013 - 12:13 AM.

Wonder woman has already received another sexy cats costumes change.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users


This topic has been visited by 74 user(s)