Export OSX Wiki Server to CSV

May 12, 2012

Recently our Organization has grown to immense size and we are starting to outgrow the 10.6 Wiki server that we use primarily for our intranet. I have been looking at the 10.7 wiki server however it is not much better, our intranet has been plagued with bouts of corruption and plist issues that have caused slow load times, and extreme data loss. Its pretty clear that we need to move to a more stable information storage media. We have looked at WordPress and Drupal for this functionality however the biggest issue is getting the data from the Wiki Server into one of these installations. I noticed that both Drupal and WordPress have many plugins or modules that offer the ability to import content from CSV however getting a Wiki Server content set into CSV is not as easy as it sounds.

I found this script which works great at extracting the information that is stored in the plist file in each of the page folders in the Wiki structure. However grabbing the content out of the page.html file stored in each .page folder was what I was looking to do. I wrote a helper script that recursively copies and runs the script with a few modifications and then exports all the data I wanted to CSV. The script then copies the CSV files to the main export folder and then deletes all the files that it created in the WIki Server structure.

Usage

**To use this script you must copy the folder and all three of the scripts inside it to the root level of your Server HD. Each script has a variable you must set, once you have set the initial path of your Wiki Deployment and the base URL structure you need to make the files executable. You can do this by

chmod 700 -R /export

this should make the scripts executable. Once done you need to run the run.sh script with sudo. This will trigger the export. This is no where near perfect so I have opened up a GitHub repository for the changes that I have made, and the addition to the helper script that runs these recursively. This also exports content in user blogs as well.

The one challenge I am having is running the script that exports the page.html file content and keeping the encoding at utf-8 so that I don’t get any artifacts or odd characters.

Here are the scripts

Run.sh

#!/bin/bash
##### CONFIGURE HERE ########

# put your full path to your collaboration files
fullpath=/Wiki/wiki/Collaboration
##### END CONFIGURATION #####
mkdir /export/users
mkdir /export/users/blogs
mkdir /export/groups
mkdir /export/groups/blogs
mkdir /export/groups/wikis
for i in `ls $fullpath/Groups`
do
cp /export/export-blog.sh $fullpath/Groups/$i/weblog/
cp /export/export.sh $fullpath/Groups/$i/wiki/

# Export Group Wikis
cd $fullpath/Groups/$i/wiki/
./export.sh
mkdir /export/groups/wikis/$i
cp $fullpath/Groups/$i/wiki/wikipages.csv /export/groups/wikis/$i/
rm $fullpath/Groups/$i/wiki/wikipages.csv
rm $fullpath/Groups/$i/wiki/export.sh

# Export Group Blogs
cd $fullpath/Groups/$i/weblog/
./export-blog.sh
mkdir /export/groups/blogs/$i
cp $fullpath/Groups/$i/weblog/wikipages.csv /export/groups/blogs/$i/
rm $fullpath/Groups/$i/weblog/wikipages.csv
rm $fullpath/Groups/$i/weblog/export-blog.sh
done
for i in `ls $fullpath/Users`
do

# Export User Blogs
cp /export/export-blog.sh $fullpath/Users/$i/weblog/
cd $fullpath/Users/$i/weblog/
./export-blog.sh
mkdir /export/users/blogs/$i
cp $fullpath/Users/$i/weblog/wikipages.csv /export/users/blogs/$i/
rm $fullpath/Users/$i/weblog/wikipages.csv
rm $fullpath/Users/$i/weblog/export-blog.sh
done
exit 0

export.sh

#!/bin/sh - 
#
# Script to extract data from an Apple WikiServer's data store by querying the
# filesystem itself. Creates a 'wikipages.csv' file that's readable by any
# spreadsheeting application, such as Numbers.app or Microsoft Excel.app.
#
# USAGE:   To use this script, change to the WikiServer's pages directory, then
#          just run this script. A file named wikipages.csv will be created in
#          your current directory. For instance:
#
#              cd /Library/Collaboration/Groups/mygroup/wiki  # dir to work in
#              wikipages2csv.sh                               # run the script
#              cp wikipages.csv ~/Desktop                     # save output
#
# WARNING: Since the WikiServer's files are only accessible as root, this script
#          must be run as root to function. Additionally, this is not extremely
#          well tested, so use at your own risk.

##### CONFIGURE HERE ########
# The prefix to append to generated links. NO SPACES!
WS_URI_PREFIX=https://my-server.example.com/groups/wiki/
##### END CONFIGURATION #####
# DO NOT EDIT PAST THIS LINE
#############################

WS_CSV_OUTFILE=wikipages.csv
WS_PAGE_IDS_FILE=`mktemp ws-ids.tmp.XXXXXX`

function extractPlistValueByKey () {
    head -n 
      $(expr 1 + `grep -n "<key>$1</key>" page.plist | cut -d ':' -f 1`) page.plist | 
        tail -n 1 | cut -d '>' -f 2 | cut -d '<' -f 1
}
function linkifyWikiServerTitle () {
    echo $1 | sed -e 's/ /_/g' -e 's/&/_/g' -e 's/>/_/g' -e 's/</_/g' -e 's/?//g'
}
function formatISO8601date () {
    echo $1 | sed -e 's/T/ /' -e 's/Z$//'
}
function csvQuote () {
    echo $1 | grep -q ',' >/dev/null
    if [ $? -eq 0 ]; then # if there are commas in the string
        echo '"'"$1"'"'   # quote the value
    else
        echo "$1"         # just output the as it was received
    fi
}
PSTALLY=`ls -l | grep -v ^l | wc -l`
if [ $PSTALLY -gt 4 ] ; then
ls -d [^w]*.page | 
  sed -e 's/^([a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]).page$/1/' > $WS_PAGE_IDS_FILE
fi

echo "Title,ID,Date Created,Last Modified,URI,Content" > $WS_CSV_OUTFILE
while read id; do
    cd $id.page
    title="$(extractPlistValueByKey title)"
    created_date="$(formatISO8601date $(extractPlistValueByKey createdDate))"
    modified_date="$(formatISO8601date $(extractPlistValueByKey modifiedDate))"
    link=$WS_URI_PREFIX"$id"/`linkifyWikiServerTitle "$title"`.html
    FILE_DATA=`echo $( /bin/cat page.html ) | tr ',' ' '`
    cd ..
    echo `csvQuote "$title"`,$id,$created_date,$modified_date,`csvQuote "$link"`,"$FILE_DATA" >> $WS_CSV_OUTFILE
done < $WS_PAGE_IDS_FILE
rm $WS_PAGE_IDS_FILE

export-blog.sh

#!/bin/sh -
#
# Script to extract data from an Apple WikiServer's data store by querying the
# filesystem itself. Creates a 'wikipages.csv' file that's readable by any
# spreadsheeting application, such as Numbers.app or Microsoft Excel.app.
#
# USAGE:   To use this script, change to the WikiServer's pages directory, then
#          just run this script. A file named wikipages.csv will be created in
#          your current directory. For instance:
#
#              cd /Library/Collaboration/Groups/mygroup/wiki  # dir to work in
#              wikipages2csv.sh                               # run the script
#              cp wikipages.csv ~/Desktop                     # save output
#
# WARNING: Since the WikiServer's files are only accessible as root, this script
#          must be run as root to function. Additionally, this is not extremely
#          well tested, so use at your own risk.

##### CONFIGURE HERE ########
# The prefix to append to generated links. NO SPACES!
WS_URI_PREFIX=https://my-server.example.com/groups/wiki/

##### END CONFIGURATION #####
# DO NOT EDIT PAST THIS LINE
#############################

WS_CSV_OUTFILE=wikipages.csv
WS_PAGE_IDS_FILE=`mktemp ws-ids.tmp.XXXXXX`

function extractPlistValueByKey () {
    head -n 
      $(expr 1 + `grep -n "<key>$1</key>" page.plist | cut -d ':' -f 1`) page.plist | 
        tail -n 1 | cut -d '>' -f 2 | cut -d '<' -f 1
}
function linkifyWikiServerTitle () {
    echo $1 | sed -e 's/ /_/g' -e 's/&/_/g' -e 's/>/_/g' -e 's/</_/g' -e 's/?//g'
}
function formatISO8601date () {
    echo $1 | sed -e 's/T/ /' -e 's/Z$//'
}
function csvQuote () {
    echo $1 | grep -q ',' >/dev/null
    if [ $? -eq 0 ]; then # if there are commas in the string
        echo '"'"$1"'"'   # quote the value
    else
        echo "$1"         # just output the as it was received
    fi
}
ls -d [^w]*.page | 
  sed -e 's/^([a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]).page$/1/' > $WS_PAGE_IDS_FILE
echo "Title,ID,Date Created,Last Modified,URI,Content" > $WS_CSV_OUTFILE
while read id; do
    cd $id.page
    title="$(extractPlistValueByKey title)"
    created_date="$(formatISO8601date $(extractPlistValueByKey createdDate))"
    modified_date="$(formatISO8601date $(extractPlistValueByKey modifiedDate))"
    link=$WS_URI_PREFIX"$id"/`linkifyWikiServerTitle "$title"`.html
    FILE_DATA=`echo $( /bin/cat page.html ) | tr ',' ' '`
    cd ..
    echo `csvQuote "$title"`,$id,$created_date,$modified_date,`csvQuote "$link"`,"$FILE_DATA" >> $WS_CSV_OUTFILE
done < $WS_PAGE_IDS_FILE
rm $WS_PAGE_IDS_FILE

AI Usage Transparency Report

Pre-AI Era · Written before widespread use of generative AI tools

AI Signal Composition

Rep Tone Struct List Instr

Repetition: 65%

Tone: 65%

Structure: 65%

List: 2%

Instructional: 17%

Emoji: 0%

Score: 0.06 · Low AI Influence

Summary

The user is trying to export data from an Apple WikiServer using a script. The script extracts data from the plist file and page.html file in each page folder. However, the user is having trouble keeping the encoding at utf-8 so that they don't get any artifacts or odd characters.

Discovering Mole: A Command Line Utility for Mac Cleaning

macadmins bash-scripts

automation video

Discovering Mole: A Command Line Utility for Mac Cleaning

Caches pile up, apps leave behind junk, and disk space slowly disappears. While there are plenty of GUI tools out there, most of them either lack transparency or feel overly bloated.

Deploy Firmware Passwords

There's no doubt that the security of our computers these days is a very sensitive topic. I have helped several of my clients protect their Mac systems by setting firmware passwords. However, this process can be time-consuming and labor-intensive when dealing with large numbers of machines. But what if you have hundreds or thousands of computers you want to have a firmware password set on?

Enable Accessibility Apps via ARD

I am always looking for ways to use Automator to make my life easier. Its a great tool that offers some impressive capabilities, my favorite of course is the ability to record UI events and convert that into a workflow or even a stand-alone app that you can then deploy and run via ARD. This feature in particular has been a game-changer for me, allowing me to automate repetitive tasks with ease and streamline my workflow.

Roll your own DNS monitoring with DIG, Bash & CRON

If your like me your always looking for ways to be notified of things changing in your IT Environment. There are many tools that you can use to help do this. StatusCake is a great free online tool for monitoring website and IP level uptime and downtime with baked in email notifications. Zeonoss and NAGIOS are great tools that can offer the same with SNMP Monitoring baked in as well.

Authenticate with AD credentials via ARD / SSH

Binding a Mac to an AD is fairly straight forward. Most Mac Admin's worth their salt, know how this is done, many know how to do this via the command line. Once your Mac is bound, authentication is easy, local authentication that is. But what if you want to use your secure AD credentials over an SSH or Apple Remote Desktop connection? Well thats when things need a bit more configuration. Having recently deployed a series of servers with this configuration I figured I would share some of the commands...

Fontrestore, Apple’s fix for your fonts

FontAgent Pro is a great font management solution for OS X. One of the best things about it is that its 100% cloud based. You can run the entire thing hosted in their cloud instance or you can run it on your own server. It's a great solution for font management, and does everything from managing your font licenses, users, libraries, and sets. The one problem however is the fact that when deploying a new font solution, you find yourself in a quandary over the right way to deploy it....

Protect your Mac!

Apple computers recently have exploded in popularity, Apple stock is soaring and Apple computers are now and have been for some time prime real estate for sticky fingers. So what is an Apple user to do? Keep your beloved computer locked up? With the threat of loss, or theft of Apple devices being a reality, many companies and solutions have emerged in the marketplace to address this growing concern.

Install Zenoss on 10.9 Mavericks with VMWare Fusion

If you are a network (or systems) administrator, you know how crucial it is to have the right tools for the job. One of the toughest tools to really nail down is a network monitoring tool. Although there are plenty of such tools out there, they range from the over-priced to the under-featured. Where do you look for any sort of middle ground where features don’t lose out to price?

10.9 Deploying Mac App Store Packages

If your like me then your happy that Apple has made several of their wonderful software titles free recently, specifically iLife and iWork for Mavericks. Apple has a defined workflow for deployment of these systems. Their method is to have companies enroll into their Volume Licensing Program once enrolled you can download apps from the app store and the iOS store and deploy these seamlessly to your devices with Profile Manager for Mavericks.

10.9 Mavericks, AutoDMG a match made in heaven

If your like me then you have an entire organization of users who are itching to get their hands on the latest Mavericks operating system and have been told to wait, we are testing. Truth is that its already been tested. I tested it all through the various developer builds and the issues have for the most part been very minimal which is great for a .0 release. However the issue really has been how are we going to deploy it.

AI Usage Transparency Report

Related Posts

Subscribe to my newsletter

Thank you!