Heyo,
I have been doing a project for my college, which involved a question paper generation module. I initially relied on LaTex for doing the job and compiling the question paper as pdf from it. But, the staff may sometimes need to edit the questions on the fly before generating a pdf. So, it meant I have to give them an editable format. Most of the staff in my college are not well versed in LaTex, especially the ones from Non-engineering. So I had to go with some text document formats like docx or odt. The project was written in Django, so I searched for docx wrappers in Python (as my college was still in Windows). I eventually found out python-docx but it lacked many features that I wanted, like tabstop to seperate between the question and it's mark or to write two words at two ends of a single line etc. Yes, there were issues file on these, but nothing seems to happen on that front. So, I decided to drop the idea of using docx and move with odt, since newer versions of MS Word supports them without much hassle. That's when I ended up with odfpy
Like any other library, I wanted to read the documentation of the library to understand how basic stuff works (Yeah, the RTFM part of me!!). Boy, was I disappointed. The documentation part of odfpy is crap, to put it out bluntly. Yes, there exits an API documentation but it seems too technical and doesn't convey the concepts well. So, it was pretty difficult to get what I want with it. Thanks to the hacker mindset, I was ready to read code and understand it. And the upstream had added several tests to test most of the features which I could use as reference. I was able to get what I wanted and I decided to blog about it so that it'll be a help to others and a note to myself for future reference. So here goes.
To install odfpy in Debian systems, one may use the command sudo apt-get install python-odf
. You can also use pip
for the job by using sudo pip install odfpy
.
Let the hacking begins
- Create a python file named odftest.py with the following contents
#!/usr/bin/env python
# -*- coding: utf-8 -*-from odf.opendocument import OpenDocumentText
from odf.style import (Style, TextProperties, ParagraphProperties, ListLevelProperties, TabStop, TabStops)
from odf.text import (H, P, List, ListItem, ListStyle, ListLevelStyleNumber, ListLevelStyleBullet)
from odf import teletype
These are the common modules we need to import for a document. The first one is
OpenDocumentText
which is the master module that deals with the base class.odf.style
has the stuff needed to stylize the document. Like properties for paragraphs, text, lists etc.odf.text
handles the text objects like headings, paragraphs, lists etc.teletype
module deals with retrieval and insertion of text to elements with proper handling of line breaks and whitespaces. -
First we need to create an
OpenDocumentText
object which will act as the base object which creates the document.
textdoc = OpenDocumentText()
-
Next we need to create styles to be used in the document - Headings, Bold texts, Bullets, Numbering etc. The following code is written for readability and may be bummed skipping variable assignments with direct usage.
# For Level-1 Headings that are centerd
h1style = Style(name="CenterHeading 1", family="paragraph")
h1style.addElement(ParagraphProperties(attributes={"textalign": "center"}))
h1style.addElement(TextProperties(
attributes={"fontsize": "18pt", "fontweight": "bold"}))
Here, the first line of code initializes a
Style
object, which has a specific name and belongs to a specific family. Alignment of the text is the property of a paragraph, so it is given as an argument to ParagraphProperties in the second line. However, fontsize and fontweight apply to a specific text, not a paragraph. Hence, they are given as argument to TextProperties object.# For Level-2 Headings that are centered
h2style = Style(name="CenterHeading 2", family="paragraph")
h2style.addElement(ParagraphProperties(attributes={"textalign": "center"}))
h2style.addElement(TextProperties(
attributes={"fontsize": "15pt", "fontweight": "bold"}))
This block of code defines a smaller headin (note the change in fontsize attribute)
# For bold text
boldstyle = Style(name="Bold", family="text")
boldstyle.addElement(TextProperties(attributes={"fontweight": "bold"}))
Here we are defining a style to make text bold. Since it doesn't have anything to deal with paragraphs, it only has TextProperties object
# Justified style
justifystyle = Style(name="justified", family="paragraph")
justifystyle.addElement(ParagraphProperties(
attributes={"textalign": "justify"}))
Here, we specify the justified style, that is applicable to paragraphs. Hence it uses a ParagraphProperties element.
# For numbered list
numberedliststyle = ListStyle(name="NumberedList")
level = 1
numberedlistproperty = ListLevelStyleNumber(level=str(level), numsuffix=".", startvalue=1)
numberedlistproperty.addElement(ListLevelProperties(minlabelwidth="%fcm" % (level)))
numberedliststyle.addElement(numberedlistproperty)
Here we define the style to be used for Level 1 numbering, which means no nesting of numbering. The first line, as always defines a Style object which will represent the style. ListLevelStyleNumber() is used to specify that the list is a numbered list and to define the level of the numbering, which in our case is 1. The attribute
numsuffix
defines the character that should be inserted after a numeral, in the numbering scheme - which in our case is a period (.) and startvalue defines the starting number of the list.
Also, we add ListLevelProperties to impart some features to the items of the list. Here, it isminlabelwidth
which specifies how much width should be given to the numbering portion of the text, i.e how much space should be left after the number before the content begins.# For Bulleted list
bulletedliststyle = ListStyle(name="BulletList")
level = 1
bulletlistproperty = ListLevelStyleBullet(level=str(level), bulletchar=u"•")
bulletlistproperty.addElement(ListLevelProperties(
minlabelwidth="%fcm" % level))
bulletedliststyle.addElement(bulletlistproperty)
Bulleted list is pretty similar to numbered list and the differences are it uses
ListLevelStyleBullet
which takesbulletchar
as argument. Rest is pretty much the same.# Creating a tabstop at 10cm
tabstops_list = TabStops()
tabstop = TabStop(position="10cm")
tabstops_list.addElement(tabstop)
tabstoppar = ParagraphProperties()
tabstoppar.addElement(tabstops_list)
tabstyle = Style(name="Question", family="paragraph")
tabstyle.addElement(tabstoppar)
s.addElement(tabstyle)
In this block, we specify the tabstops we may encounter in the documet. The
TabStops
element is a collection of all the tab stops we may define.TabStop
element is actually the one pointing to each tab stop and understandably, it takesposition
as an argument. Since tab stops apply to a paragraph, we create aParagraphProperties
object to which we add the list of tabstops we have -tabstops_list
. For it to be applied to a text, as seen below, we have to make it a style. So, we createtabstyle
for that purpose and add theParagraphProperties
element to it. -
So, now we have created all our necessary styles. But, how do we associate that to the
OpenDocumentText
object we created? How will we specify that these all are the style that may be used in the document? For that, we have to add each of the styles we created to the document's style list. It is done as followss = textdoc.styles
s.addElement(h1style)
s.addElement(h2style)
s.addElement(boldstyle)
s.addElement(numberedliststyle)
s.addElement(bulletedliststyle)
s.addElement(justifystyle)
s.addElement(tabstyle)
- So, all our styles have been created and added to our Document. Now it is time to actually insert some text and apply these styles to it. Let's first add our main heading
mymainheading_element = H(outlinelevel=1, stylename=h1style)
mymainheading_text = "This is my main heading"
teletype.addTextToElement(mymainheading_element, mymainheading_text)
textdoc.text.addElement(mymainheading_element)
odfpy has two main classes for text -
H
andP
.H
is for headings andP
is for paragraphs. In this block, since we are adding a heading, we create anH
object and specifies the outlinelevel (which is normally 1) and the stylename that we created earlier,h1style
. Also, we add some text to the object usingteletype.addTextToElement
method. Like I said before, we use teletype to properly handle whitespaces like tabs or newlines. Instead, we can directly give the text astext
argument to the initialization ofH
(orP
) object. But this will simply skip tabs or newlines instead of handling them. So I prefer usingteletype
. Finally, we add the createdH
object to our document usingtextdoc.text.addElement
method. -
Similarly, create a subheading using
h2style
as stylename -
Adding a paragraph is also similar
paragraph_element = P(stylename=justifystyle)
paragraph_text = """
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem
Ipsum has been the industry's standard dummy text ever since the 1500s, when an
unknown printer took a galley of type and scrambled it to make a type specimen
book. It has survived not only five centuries, but also the leap into electronic
typesetting, remaining essentially unchanged. It was popularised in the 1960s
with the release of Letraset sheets containing Lorem Ipsum passages, and more
recently with desktop publishing software like Aldus PageMaker including
versions of Lorem Ipsum.
"""
teletype.addTextToElement(paragraph_element, paragraph_text)
textdoc.text.addElement(paragraph_element, paragraph_text)
- Let's see how to add a bulleted list with two items
bulletlist = List(stylename=bulletedliststyle)
listitemelement1 = ListItem()
listitemelement1_paragraph = P()
listitemelement1_content = "My first item"
teletype.addTextToElement(listitemelement1_paragraph, listitemelement1_content)
listitemelement1.addElement(listitemelement1_paragraph)
bulletlist.addElement(listitemelement1)
listitemelement2 = ListItem()
listitemelement2_paragraph = P()
listitemelement2_content = "My second item"
teletype.addTextToElement(listitemelement2_paragraph, listitemelement2_content)
listitemelement2.addElement(listitemelement2_paragraph)
bulletlist.addElement(listitemelement2)
textdoc.text.addElement(bulletlist)
As we seen earlier, we create a
List
object to represent the complete list with the stylebulletedliststyle
that we defined earlier as stylename argument. For each individual item of the list, we need aListItem
object which contains aP
object that holds the text. We add text to the paragraph usingteletype
, then add theP
object to theListItem
object and theListItem
object to theList
object. This is repeated for anotherListItem
. ThisList
object is finally added to the document. - Adding a numbered list is similar to a bulleted list. We just specify
numberedliststyle
as stylename. -
To use the tabstop style we defined, create a paragraph with that style
newtext = "Testing\tTabstops"
tabp = P(stylename=tabparagraphstyle)
teletype.addTextToElement(tabp, newtext)
textdoc.text.addElement(tabp)
- So, we have added all the necessary contents to our document. Now we should save it to an odt file. For that, we use the
save
method.textdoc.save(u"myfirstdocument.odt")
If we open the document 'myfirstdocument.odt' in LibreOffice, we can see something like the following.
The complete code is as follows