Wiki Information Migration Workflow

From OSF Wiki
Jump to: navigation, search

This document presents the basic workflows, processes and methods for creating and migrating content for this Wiki. This document provides the process description of how a knowledge base such as this wiki can get created and maintained.

Basic Workflow

The basic idea for wiki content workflow is to allow notes and initial authoring to take place in one location, then to move it to another as it gets fleshed out and more substantive. This next location then becomes the focal point for collaboration and group editing. Ultimately, these collection points allow the creation of domain-specific knowledge bases.

The basic process of this workflow is shown in the accompanying diagram.

Wiki content flow.png

The left-hand side of the diagram indicates that initial notes and so forth may occur in a limited, controlled environment, perhaps even one that is proprietary. Wikis work great for note and idea collection. However, most information gathered in this manner never warrants public use or refinement. Perhaps the topic is related to tracking an idea later abandoned; perhaps the information is by nature confidential or proprietary.

However, a portion of this information is worth more formal expansion and treatment. This is the information inherently worthwhile for moving to a shared knowledge base.

In the case of this instance, there are two kinds of such information:

  1. technical and background information
  2. process or methodology or task-oriented information.

The diagram reflects this fact.

Either form of information can be migrated to its respective collaboration platform. There, depending on access rights, external contributors may follow a similar process and add their own complementary content. This same basic process can be followed for any variety of instances.

The external contributors may need to be screened or given preferred content rights, since quality of content and adherence to organization and structural consistencies are important. Remember, the purpose of this workflow is to create re-usable knowledge bases, and for that reason some access and other controls is warranted in order to provide completeness and consistency.

Of course, once a knowledge base is distributed, its new owners may choose to go in any direction with content that is controlled or not. Such is the beauty of the wiki technology!

The next sections break down this workflow into steps and describe how each is conducted.

Mediawiki Export and Import

Mediawiki uses an abstract XML based format for content dumps. This is what Special:Export generates, and also what is used for XML dumps of Wikipedia and other Wikimedia sites. This can be imported into another wiki using Mediawiki via the Special:Import page or by using MWDumper or xml2sql, among others.

These export and import procedures have been vetted for these standard Mediawiki content types:

  • Standard wiki pages
  • Wiki categories
  • Media, such as files or images
  • Templates.

The Template category must be specifically requested during the export process (see below). All other content types should be labeled with the category tags used for export purposes (see next).

This tutorial focuses on Mediawiki's own Special:Export and Special:Import utilities. Links to the other options are provided at the conclusion.

Import/Export Categories

The export function (see next section) works most easily with content of all types in discrete categories. Since migration is a specific purpose of this wiki, special categories have thus been created for this purpose.

As the diagram above shows, our potential targets for export are the zWiki. Sometimes, content needs to be earmarked for both.

The purpose and role of these export targets is as follows:

  • zWiki is the wiki version for all technical aspects related to this knowledge base. Technical aspects include -- but are not limited to -- specifications; how-to information; software and systems backgrounds; architectural info; installation guides; technical definitions; concepts and glossaries; technically related external information and links; flowcharts; schematics; workflows; best practices; etc.
  • zWiki is the comprehensive knowledge base for all relevant technical and process and methodology information related to the domain instance at hand. As for technical aspects, the zWiki contains a complete version of its supporting zWiki. From a process and methodology perspective, the zWiki adds to this information relating the current domain to information development in general, plus phases, activities, tasks and roles related to the general MIKE2.0 methodology. Because of this complete coverage, the zWiki is considerably larger in size than its complementary zWiki, which is fully contained within it.

The zWiki category is a special internal one used to "tag" pages and other content deserving to be exported to, or imported from, a zWiki, respectively. They have no substantive meaning other than for import/export purposes. Also, the 'z' prefix has no meaning; it is used only to support its placement at the bottom of various alphabetical listings of categories.

Preparing and Exporting Content

The Export help document explains the basic Mediawiki export API. Note that the export procedure is only available to privileged users of the wiki.

Preparing Content

For efficient export, you should "tag" all of your desired export content with specific categories. Though it is possible to identify and export individual pages and files, using categories enables you to export in bulk.

When tagging export files, consider using the special categories noted above. If you do choose to employ your own categories, do keep in mind that you may need to tag in multiple layers or dimensions (depending on the use cases and frequencies of your export). Also keep in mind that all documents "tagged" in a given category are flagged for export; use care not to tag too widely.

With these practices in mind, then, proceed to tag all relevant export content in your system. It is possible to tag in multiple categories, so splits with overlaps are an acceptable tagging strategy depending on your requirements.

Of course, misplaced tags on content can be easily removed by removing its category reference.

Exporting Content

To export content, follow these steps:

  1. Navigate to the Special:Export page
  2. Either enter wiki page names in order in the text box or preferably, enter a given category name into the 'Add pages from category:' text box
    • When doing so, make sure and check the appropriate checkboxes at the bottom. For standard zWiki exports, select all three checkboxes:
      • Include only the current revision, not the full history
      • Include templates
      • Save as file
  3. Repeat the above with additional categories and/or page names
  4. When all candidates are loaded, inspect the text box and determine if there are any individual documents that you do NOT want to export; delete these from the listing
  5. Select the 'Export' button
  6. At the dialog, pick a local file directory in which you want to save your exports (which are in XML form).

The export file is now saved on your local system (or wherever your save target was).

Editing the Export File Prior to Import

The file created from export is a simple readable file in XML format. As such, it can easily be edited between exporting and importing. This should be done with caution and integrity, one can make antedated edits and use false user names, and in combination with deletion, one can "change history". Applications of this editing include:

  • Adding a note to the edit summary about the importing
  • Changing user names and/or page names to avoid name conflicts (just between the title tags and between the username tags or also in links and signatures)
  • Changing namespace names into the generic or the applicable ones (ditto)
  • Doing global search and replaces such as changing 'zWiki' to some more relevant local name.

Note that if two versions of the page have the same timestamp (because one was uploaded with the same timestamp as a pre-existing version), the later (imported) version will show up in the edit history but not in the article itself.

If you are doing bulk changes, it may also be necessary to update timestamps across your import file. To do so:

  • Do a regex search on the exported XML file, looking for a matching string such as:
  • Replace with a new timestamp very close on to your current time, such as:
  • Make your other changes
  • Import the new file.

Importing Content

The Import help document explains the basic Mediawiki import API. Note that the import procedure is only available to privileged users of the wiki.

To import content, follow these steps:

  1. Navigate to the Special:Import page
  2. Using the 'Browse' button, navigate to the local directory where you earlier saved the export file (see above), and select it
  3. Select the 'Import' button
  4. The system will begin processing, and if the import is successful, you will be notified on screen. At the same time, you will see a listing of all documents and files that were uploaded, with live links.

Updates and Repeat Imports

You may repeat this process as many times as you like. For example, you might find some content was missed in the original export. In such cases, you can rectify the oversight, export again, import again, and get a clean start.

Note that if repeat content is re-imported, only updated info (as measured by the document's timestamp) is actually added to the import database. If there is unchanged content already uploaded, you will receive the message: "All revisions were previously imported."

Besides the earlier help file, there is another Import XML Dumps help file that you might find of interest.

Other Import Options

Image Files Transfer

  ; Go to a temporary folder
  cd /tmp/

  ; Create an archive of all images of the zWiki
  tar –cvzf zWiki.tar.gz /usr/share/websites/mediawiki/zWiki/images/

  ; Unzip all the images in the temp folder
  tar -xvzf zWiki.tar.gz

  ; Move all images, of all folders, into a single one
  mv `find /tmp/usr/share/websites/mediawiki/zWiki/images -type f` images/

  ; Remove the unzipped folder structure that has been created
  rm -rf /tmp/usr/

  ; Import all images in zWiki
  php /usr/share/websites/mediawiki/zwiki/maintenance/importImages.php --overwrite --user=WikiSysop /tmp/images/

  ; Remove all the images, and clean the temp folder
  rm -rf images
  rm zWiki.tar.gz

Related Links

Clean-up Steps

You may find gaps and other discontinuities from your original source site. Here are some steps to follow:

  • Go to the Special:Categories page for your new target site (where you did the import); it may display one of both of these problems:
    1. Perhaps unwanted categories are shown. If so, click on them to see their members (pages), navigate to those pages, and remove the category assignment for those pages, or
    2. There may be categories without a matching category page, in which case you should tag those categories with your export category assignment, re-export and then re-import
  • Using multi-category search, inspect the source site's page content to see if any desired export pages have not yet been tagged as such. Tag them, and then repeat export and import
  • Make sure all desired templates and files (images) have been tagged
  • Generally navigate through the new site looking for broken and missing links. If found, repeat the general approaches in the bullet above. Images are often overlooked, for example.

Tips and Techniques

  • Exercise care in the selection of media and categories, such that they easily support the category-based export procedure
  • Use care with templates in this system. Templates with categories tend to have those categories inherited by the pages in which they reside. If placing a category on a template causes some pages to be misassigned for export, consider creating two identical versions of the template with different names: one for the export pages, one for without
  • The zWiki export can be a combination of a number of category tags. This approach helps limit the number of individual pages that need to be tagged. These categories are:
  • Pages in the Image: namespace can be imported, but the images attached to them can't; see Image transfer above
  • Pages are automatically attributed to users with the same username
  • If you import to a page name that already exists the most recent revision of the now merged history will be the one displayed, so be sure to check the pages to make sure a wanted revision hasn't been replaced
  • When importing from an import source the action is logged under Special:Log/import (where the XML came from, how many revisions were imported, and comments if provided) and the action shows up on Special:RecentChanges. XML imports are also logged
  • Sometimes the Special:Log/import logging will fail to register an imported page
  • When using Special:Export, always check the very bottom of the XML. If the last line isn't </mediawiki>, don't use it
  • If you import XML of a page that has already been imported there will be duplicate revisions in the history
  • By default, the XML importing version of the interface limits filesizes to around 1.4 to 2 megabytes. This limit can be changed by the server admin (or you in php.ini in maxuploadsize=). See also this guidance from the Mediawiki manual.

Related Links and Other Options