Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

In a world where paper documents are being rapidly supplanted by electronic records, long term access to the data becomes critical. This is especially the case for legal contracts and government documents that stay valid and relevant over decades. GIZ is also facing this challenge. Just as pens and pencils are available from many manufacturers and vendors , document file formats and the applications which use them need to be supported by and available from multiple vendors. This guarantees long-term access to data, even if individual vendor companies disappear, change strategy, or dramatically change their prices. The digital format in which information is stored can either be “open” or “closed”. An open format is one which is available for everyone to use, free of charge, and capable of being built upon – for example into new software products, without any limitations. Developers can use these to produce multiple software packages, services, and products by using these formats. 

BasicallyIn short: A file’s format – the way that it’s saved and encoded – determines what data it can store, what you can do with it, and which programs can open it.

A “closed” file format is one that is proprietary – that is, trademarked and therefore only to be used . Usually this means that the technical details of the format, including its specification, are secret and known only to its original creators. It may also mean that the format is protected by copytight, trademarks, or patents, and therefore only usable by those who have paid for obtained the necessary rights to use and the specification is not publicly available or because the file format is proprietary and even though the specification has been made public, reuse is limitedeven if the specification has been made public.. Using proprietary file formats for which the specification is not publicly available can create dependence on third-party software or file format license holdersowners. The latter type of closed format can cause significant challenges to reusing the information encoded in it, forcing those who wish to use the information to buy the necessary softwarestored in it. Additional software or licensing may be required tomorrow in order to continue to access data which you create today.

Info

These are some of the most popular file formats for graphics and content:

  • JPEG (.jpeg) stands for “Joint Photographic Experts Group”. It’s a standard image format for containing lossy and compressed image data.

  • PNG (.png) is "Portable stands for “Portable Network Graphics": An image format that uses lossless compression and is generally considered the replacement to the GIF image format.

  • SVG (.svg) stands for a “Scalable Vector Graphics” file. Files in this format use an XML-based text format to describe how the image should appear. If you are a designer, you use vectors when creating your projects and convert them into .png or .jpeg formats as an output.

  • AI (.ai) file is a drawing created with Adobe Illustrator, a vector graphics editing program. It is composed of paths connected by points, rather than bitmap image data. AI files are commonly used for logos and print media.

  • Markdown (.md) is a lightweight markup language with plain-text-formatting syntax. It’s used to format text files for writing content documents and messages in various mediums such as , including online discussion forums or even GitHuband technical documentation.

  • CSV (.csv) is a stands for “Comma Separated Value” file formatValues”. These are plain text files that can contain numbers and letters only. The data is structured in a tabular, or table form. Files ending in the CSV file extension are generally used to exchange data, usually when there's a large amount, between different applications. Database programs, analytical software, and other applications that store massive amounts of information (like contacts and customer data) usually support the CSV format.

The openIMIS project will be primarily focused on image and text documentation purposes. This includes branding assets, marketing materialmaterials, and communication kits. Vector files should always be stored as .svg instead of the proprietary .ai (limited to being used only in by Adobe Illustrator). In this case, .svg is the open format that can easily be read and modified without vendor lock-in or being inaccessible due to license losses. As it is, at its essence, a version subset of another open format called XML, it can be modified in a text editor, without even using a GUI application such as Inkscapea graphica user interface.

Converting proprietary formats to open formats

The application you Often applications used to create the original file will let you save it as something elsedata files support multiple data formats, allowing you to choose between them. If, for example, you’re viewing a document online in Google Docs, you can click File and Download to bring up a list of formats to which you can convert the document. These include Microsoft Word, PDF, and plain text.

Info

When selecting file formats for archiving , the formats format should ideally be:

  • Non-proprietary An open format

  • Unencrypted

  • Uncompressed

  • In common usage use by the research community

  • Adherent to an open, documented standard, such as described by the State of California (see AB 1668, 2007)

    • Interoperable among diverse platforms and applications

    • Fully published and available royalty-free

    • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology

    • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

    • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology

    • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

There is no standardized process for converting proprietary formats to open formats. It always depends on the complexity of the format and the files, and varies from format to format. For example, converting a file from .ai to .svg using Adobe Illustrator may be as easy as exporting as .svg from the software, but sometimes something can go wrong in the process, which may go undetected until the results are closely inspected. These edge cases are the hardest ones to work around – usually an error in an .ai to .svg conversation may require rewriting or recreating the assets and then export them as .svg to ensure maximum compatibility.

File Conversion Tools

There are many online (and not only) platforms that downloadable tools and online platforms which can help you to convert open formats. In this section, we will present some ways you can to convert open formats in your desktop software, online platforms and even using terminal for people that would love to pick up some new skills while working on the projects, and finally in a command line terminal.

LibreOffice

Software Code

Open Source

Platform

Linux, Windows, macOS

...

Code Block
soffice --headless --convert-to pdf mySlides.odp

You would have to replace Replace pdf with the file extension of what file the format that you want to convert your file to. If we go into details and you don’t want LibreOffice to open an empty window on your desktop you can enter the , if something other than PDF is desired. The --headless option means LibreOffice will run only on the command line and exit after completing the requested task.

Turning to the command line is a great way to convert several files at once. If, for example, you want to convert all of the Microsoft Word documents in a folder to the Open Document Text format (used by LibreOffice Writer format, and many others) then you'd type:

Code Block
soffice --headless --convert-to odt *.docx

The conversion takes far less time than opening all files in LibreOffice Writer and doing the file format conversion manually.

Info

This section focuses on the conversion feature of the LibreOffice suite in its wholegeneral. In the next following chapters, LibreOffice Draw, Writer and Impress are specifically mentioned for the relevant document file formats they can be are commonly used for.

Pandoc

Software Code

Proprietary

Platform

Online Browser

...

There are lots of tools available to create visual assets (graphics) for any need. While the current industry standard is Adobe Creative Cloud, we try to keep proprietary software to a minimum. When this was not possible or difficult, the minimum we aim for open formats. In the following we will go through recommended tools, the various file formats they support, and limitations they pose in different conversion settings.

...