Gotchas!
Internal links on a page that may be located anywhere on a site must be absolute paths.
Uploading everything to the site may be prohibitively slow. Check only the files that have changed.
So two questions might come to mind... why
This is what
We'll get into the details of that later, but first we're going to focus on the mechanics of
using
Internal links on a page that may be located anywhere on a site must be absolute paths.
Uploading everything to the site may be prohibitively slow. Check only the files that have changed.
Development of the site occurs at your local desktop PC. In order to simplify development and take advantage
of
The first thing I did was to pull down the head version from CVS, which was 2.1.4-dev, because I like to live on the bleeding edge, however I'm sure that 2.1.3 would be the sensible choice for the majority of people out there. I had no issues with the build, but ymmv.
I'll assume you had your Java build environment previously setup, including Ant. In this case you just need to execute
cd \cocoon-2.1 build.bat webapp. This causes a standard web application to be built in the
cocoon-2.1\build\webapp folder. This web application is the Apache Cocoon default site,
containing all the samples and documentation. It makes a great starting point from which to build a custom
Now given this existing web application we could just jump in and run the
To start with, copy the entire web application. We'll strip it down later. I created a
whole new subdirectory CocoonGK alongside the
The first of the scripts we need allows us to run a local site dynamically:
\cocoon-2.1\cocoon servlet -c webapp -w temp\work. This calls the
cocoon.bat batch file. The servlet parameter indicates that we want to
run a live site. This site will be launched on port 8888, and so can be accessed at http://127.0.0.1:8888/.
You'll also need to ensure you have the correct environment variables set up.
SET COCOON_HOME=e:\cocoon-2.1 SET COCOON_LIB=e:\CocoonGK\webapp\WEB-INF\lib SET JETTY_WEBAPP=e:\CocoonGK\webappThe JETTY_WEBAPP in particular is required to run this live site.
After running this script you should now have an instance of Jetty running in the background. Head on over
to the localhost:8888 address to make sure it is up. You should see the default Cocoon site. At this point
you have a full
Next we'll pair this site down to its bare minimum.
The web app contains the following directory structure:
api docs resources samples stylesheets WEB-INF resources\icons resources\images resources\logos resources\scripts resources\styles stylesheets\system WEB-INF\classes WEB-INF\db WEB-INF\deli WEB-INF\entities WEB-INF\lib WEB-INF\deli\config WEB-INF\deli\legacyProfiles WEB-INF\deli\config\vocab WEB-INF\entities\catalog-demo WEB-INF\entities\w3c
Of these, you can safely delete api, docs, samples, and the contents of
WEB-INF/classes.
You can also delete welcome.xml and welcome.xslt from this folder.
The resources folder can be safely kept, especially as it contains the cocoon-powered
range of logos that you may want to place on your site.
Finally we are left with the sitemap itself. I keep a copy of this file around as a reference for the common
pipeline components. Bear in mind though, that the best resource for the parameters that a particular component
supports is often the
You can see the sitemap that I ended up with below. However all you
really need to do to this file is to remove the contents of the <map:pipeline/> elements.
Personally I prefer to delete everything that I'm not using, then pull it back in from the handy reference
copy that we kept around a second ago.
The I18N transformer component need custom configuration in the <map:components/> element
but for the most part you won't need to modify this area.
Next up is the <map:resources/> element. This is a handy place to store commonly reused
pipeline fragments, although we've not used any here.
Following is the <map:views/> element. This is to us because
these views allow our pages to be displayed at any stage within the pipeline that generated them. For pages
that are driven by complex XML transforms, this makes debugging much easier. But the links view and thus the LinkSerializer to extract all the obvious hyperlinks from
a page.
Now that you have a clean web application, I suggest creating a simple html file and a pipeline matcher
to simply generate and serialize the file when the index.html URI is requested. Reload http://localhost:8888/index.html to ensure you can see it.
If all this has gone smoothly, you now have a clean site from which to build.
We briefly mentioned the LinkSerializer and links view. The
Each URI is retrieved twice, once using the
LinkSerializer and once in regular mode. The
regular copy is saved in a given folder. Once processing is complete this folder contains every reachable
page.
If our site is simple enough and does not contain absolute URIs, then you should be able to browse this folder using your favourite web browser. If you do have absolute URIs, you'll have to copy these files to a root folder or upload them to your site to see the static URIs in action.
Hint: if a page is not being traversed by the ?cocoon-view=links suffix added to the URL. If your link does not show up here, then you have
located your problem. Fixing it may require adding another entry URL to the
The following script fragment calls the
call \cocoon-2.1\cocoon cli -r true -c webapp -d site -w temp\work index.htmlThis
-r parameter indicates that we would like to recurse through pages using the
link crawling mechanism. The -c parameter indicates the location of the web
application. The -d parameter indicates the destination in which to place the
generated site, and the -w parameter indicates where to place When the -w parameter to store this information. Again,
the simpler things are kept, the cleaner a CLI build you will have.
This article and
Programmatically this means that their src attribute is modified to point at a new URI. This
transformation cannot be detected by the
This can be done in a couple of ways. Since
call \cocoon-2.1\cocoon cli -r true -c webapp -d site -w temp\work index.html >>>> allthumbs.html <<<< >>>> del site\allthumbs.html <<<<
The allthumbs page you see here requires a sitemap entry to create, which you can see
in the sitemap source below.
The use of cascading style sheets suffers much the same problem. In the case of
Watch out for renamed files. The
This section will be heavy on Windows command file scripting, so be prepared. The general technique used to upload
At this point we have run the site directory.
We can now begin creating an FTP command file that will script the upload process using Windows FTP.
The >> syntax should be familiar to most of you, all it does is append the output
of a command into a file.
The first few commands we need are connection oriented:
del ftpCommands.txt 2> NUL echo open remotesite >> ftpCommands.txt echo user username password >> ftpCommands.txt echo prompt off >> ftpCommands.txt echo verbose off >> ftpCommands.txt echo hash off >> ftpCommands.txt echo bin >> ftpCommands.txtWhen executed by Windows FTP this will connect into our server.
The next step is to build the directory structure into which the files will be placed. To do this we use the windows dir (unix equiv.: ls) command to list recursively all the directories that need to be uploaded.
dir /ad /s /b site > templist.txt for /F "delims=" %%L in (templist.txt) do ( SET T=%%L SET R=!T:E:\CocoonGK\site\=! SET R=!R:\=/! echo mkdir !R! >> ftpCommands.txt )
The output of the dir command is still not quit what we would like, which is where the for loop
comes in. For is capable of iterating through each entry in a file, and we then use delayed expansion
environment variables to alter the text before appending it to our ftpCommands file as a make
directory command. You must use the SETLOCAL ENABLEDELAYEDEXPANSION to be able to use
this technique. !T:E:\CocoonGK\site\=! means replace occurrences of e:\CocoonGK\site\
in the environment variable T with whatever is between the = and the !.
In this case, nothing. So e:\CocoonGK\site\images becomes just images. Similarly
!R:\=/! just replaces backslashes (windows directory separator) with forward slashes
(UNIX directory separator).
The upshot of that snippet of code is a set of MKDIR commands in out ftpCommand file that will create our entire directory structure on the server.
This process is repeated for the individual files. This time, however we use
xcopy instead of
dir.
xcopy /s /D /I /L /Y site uploadedSite > templist.txt for /F "delims=" %%L in (templist.txt) do ( SET T=%%L if "!T:File=!" == "!T!" ( SET R=!T:site\=! SET H=!R:\=/! echo put !R! !H! >> ftpCommands.txt ) )
Xcopy has the useful ability to only copy those files that have changed, and even better, to
only display what it would do instead of actually doing it. Taken in conjunction with a directory of the
files that were uploaded to Of course the output is not quite FTP compatible, so we use for again to manipulate the
individual file paths and output a put statement to our ftpCommands file.
Finally we finish off our script with a quit statement. Our final generated script then
looks something like this:
open remotesite user username password prompt off verbose off hash off bin mkdir myfolder mkdir myfolder/images put index.html index.html put myfolder\images\myimage.jpg myfolder/images/myimage.jpg quitThe following code executes that script, which actually connects to the server and performs the upload. As mentioned briefly in the FTP sidebar, I had to provide the
-w:65535 parameter to the FTP statement, or
PureFTPD at At this point we have uploaded our entire generated site with a single windows command file. Because sites
can get to be quite large, we really only want to upload changed files. As mentioned above, we handle this
by persuading xcopy to provide a list of the modified files. Next time round, we'll need an
updated list, and I accomplish this by maintaining an entire copy of the site locally with this command:
xcopy /s /D /I /Y site uploadedSite. Happily the only difference between this command and the
last time we used xcopy is that this time, we allow it to actually perform the copy by removing
the /L parameter.
Try tweaking the -w buffer size parameter if FTP hangs.
So! Now we have a single click method of deploying our
If you have any comments or suggestions about the article the source or the site, feel free to email Peter {peter@grumpykitty.biz}.
cd \cocoon-2.1 build.bat webapp
api docs resources samples stylesheets WEB-INF resources\icons resources\images resources\logos resources\scripts resources\styles stylesheets\system WEB-INF\classes WEB-INF\db WEB-INF\deli WEB-INF\entities WEB-INF\lib WEB-INF\deli\config WEB-INF\deli\legacyProfiles WEB-INF\deli\config\vocab WEB-INF\entities\catalog-demo WEB-INF\entities\w3c
SET COCOON_HOME=e:\cocoon-2.1 SET COCOON_LIB=e:\CocoonGK\webapp\WEB-INF\lib SET JETTY_WEBAPP=e:\CocoonGK\webapp
\cocoon-2.1\cocoon servlet -c webapp -w temp\work
call \cocoon-2.1\cocoon cli -r true -c webapp -d site -w temp\work index.html
call \cocoon-2.1\cocoon cli -r true -c webapp -d site -w temp\work index.html >>>> allthumbs.html <<<< >>>> del site\allthumbs.html <<<<
del ftpCommands.txt 2> NUL echo open remotesite >> ftpCommands.txt echo user username password >> ftpCommands.txt echo prompt off >> ftpCommands.txt echo verbose off >> ftpCommands.txt echo hash off >> ftpCommands.txt echo bin >> ftpCommands.txt
dir /ad /s /b site > templist.txt for /F "delims=" %%L in (templist.txt) do ( SET T=%%L SET R=!T:E:\CocoonGK\site\=! SET R=!R:\=/! echo mkdir !R! >> ftpCommands.txt )
xcopy /s /D /I /L /Y site uploadedSite > templist.txt for /F "delims=" %%L in (templist.txt) do ( SET T=%%L if "!T:File=!" == "!T!" ( SET R=!T:site\=! SET H=!R:\=/! echo put !R! !H! >> ftpCommands.txt ) )
open remotesite user username password prompt off verbose off hash off bin mkdir myfolder mkdir myfolder/images put index.html index.html put myfolder\images\myimage.jpg myfolder/images/myimage.jpg quit
pushd site ftp -w:65536 -n -s:..\ftpCommands.txt > ..\Logfiles\ftpLog.txt 2> ..\Logfiles\ftpErrLog.txt popd
@echo off SETLOCAL ENABLEDELAYEDEXPANSION SET COCOON_HOME=e:\cocoon-2.1 SET COCOON_LIB=e:\CocoonGK\webapp\WEB-INF\lib SET JAVA_OPTS=-Xmx384 SET DR=%DATE:~4,10% SET DR=%DR:/=-% cd /d e:\CocoonGK mkdir site 2> nul echo Rebuilding Grumpy Kitty on %DR% - this may take a few seconds... call \cocoon-2.1\cocoon cli -r true -c webapp -d site -w temp\work index.html allthumbs.html > Logfiles\cocoonLog.txt 2> Logfiles\cocoonErrLog.txt mkdir site\images 2> nul xcopy webapp\images site\images /s /y /D >nul del site\allthumbs.html echo Generating files for upload to site as of %DR% del ftpCommands.txt 2> NUL echo open remotesite >> ftpCommands.txt echo user username password >> ftpCommands.txt echo prompt off >> ftpCommands.txt echo verbose off >> ftpCommands.txt echo hash off >> ftpCommands.txt echo bin >> ftpCommands.txt dir /ad /s /b site > templist.txt for /F "delims=" %%L in (templist.txt) do ( SET T=%%L SET R=!T:E:\CocoonGK\site\=! SET R=!R:\=/! echo mkdir !R! >> ftpCommands.txt ) xcopy /s /D /I /L /Y site uploadedSite > templist.txt for /F "delims=" %%L in (templist.txt) do ( SET T=%%L if "!T:File=!" == "!T!" ( SET R=!T:site\=! SET H=!R:\=/! echo put !R! !H! >> ftpCommands.txt ) ) echo quit >> ftpCommands.txt del templist.txt echo Performing Upload Now pushd site ftp -w:65536 -n -s:..\ftpCommands.txt > ..\Logfiles\ftpLog.txt 2> ..\Logfiles\ftpErrLog.txt popd echo The following files were uploaded to Grumpy Kitty xcopy /s /D /I /Y site uploadedSite del ftpCommands.txt ENDLOCAL :endb
<map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">
<!-- =========================== Components ================================ -->
<map:components>
<map:generators default="file">
<map:generator label="content"
logger="sitemap.generator.xpathdirectory"
name="xpathdirectory"
src="org.apache.cocoon.generation.XPathDirectoryGenerator"/>
</map:generators>
<map:transformers default="xslt" >
<map:transformer name="i18n"
src="org.apache.cocoon.transformation.I18nTransformer">
<catalogues default="messages">
<catalogue id="categories"
name="categories"
location="file:///e:/cocoongk/sourcecode/WEB-INF/translations"/>
<catalogue id="messages"
name="messages"
location="file:///e:/cocoongk/sourcecode/WEB-INF/translations"/>
</catalogues>
<untranslated-text>untranslated</untranslated-text>
<cache-at-startup>true</cache-at-startup>
</map:transformer>
</map:transformers>
<map:serializers default="html" />
<map:readers default="resource" />
<map:matchers default="wildcard" />
<map:selectors default="browser" />
<map:actions />
<map:pipes default="caching" />
</map:components>
<!-- =========================== Views =================================== -->
<map:views>
<map:view from-position="last" name="links">
<map:serialize type="links"/>
</map:view>
<!-- Other views excluded, used for debugging -->
</map:views>
<!-- =========================== Pipelines ================================= -->
<map:pipelines>
<map:component-configurations>
<global-variables>
<gallery>../gallery</gallery>
<news>../news</news>
<stuff>../stuff</stuff>
<cocoonVersion>Cocoon 2.4-dev</cocoonVersion>
<copyright>(C) 2004 Grumpy Kitty Productions</copyright>
</global-variables>
</map:component-configurations>
<!-- main pipeline -->
<map:pipeline>
<!-- CSS Reader -->
<map:match pattern="css/*.css">
<map:read mime-type="text/css" src="css/{1}.css" />
</map:match>
<!-- Javascript Reader -->
<map:match pattern="js/*.js">
<map:read mime-type="text/javascript" src="js/{1}.js" />
</map:match>
<!-- Structural Images Reader -->
<map:match pattern="images/*.png">
<map:read mime-type="image/png" src="images/{1}.png" />
</map:match>
<!-- Structural Images Reader -->
<map:match pattern="images/*.ico">
<map:read src="images/{1}.ico" />
</map:match>
<!-- Added Fun Images Reader -->
<map:match pattern="*.jpg">
<map:read mime-type="image/jpeg" src="../otherImages/{1}.jpg" />
</map:match>
<!-- Gallery Thumbnail Matcher -->
<map:match pattern="thumb/**.jpg">
<map:read type="image" src="{global:gallery}/{1}.jpg">
<map:parameter name="height" value="75" />
</map:read>
</map:match>
<!-- Gallery Thumbnail Matcher (mouseover/out href has to
match rewritten extensions after offline generation)-->
<map:match pattern="thumb/**.jpg.jpeg">
<map:read src="cocoon:/thumb/{1}.jpg" />
</map:match>
<!-- Gallery Thumbnail Matcher (mouseover/out href has to
match rewritten extensions after offline generation)-->
<map:match pattern="graythumb/**.jpg.jpeg">
<map:read src="cocoon:/graythumb/{1}.jpg" />
</map:match>
<!-- Gallery Grayscale Thumbnail Matcher -->
<map:match pattern="graythumb/**.jpg">
<map:read type="image" src="{global:gallery}/{1}.jpg">
<map:parameter name="height" value="75" />
<map:parameter name="grayscale" value="true" />
<map:parameter name="scaleRed" value="0.8" />
<map:parameter name="scaleGreen" value="0.8" />
<map:parameter name="scaleBlue" value="1.0" />
</map:read>
</map:match>
<!-- Gallery Full Scale Matcher -->
<map:match pattern="full/**.jpg">
<map:read type="image" src="{global:gallery}/{1}.jpg" >
<map:parameter name="width" value="500" />
<map:parameter name="allow-enlarging" value="no" />
</map:read>
</map:match>
<!-- Gallery Frame Matcher ( Wraps a single image in HTML ) -->
<map:match pattern="frame/**/*.html">
<map:generate src="{global:gallery}/{1}" type="xpathdirectory">
<map:parameter name="depth" value="3"/>
<map:parameter name="sort" value="time"/>
<map:parameter name="include" value="{2}"/>
<map:parameter name="root" value="gallery$"/>
<map:parameter name="nsmapping" value="nsmappings.properties" />
<map:parameter name="nsmapping-reload" value="true" />
<map:parameter name="xpath" value="/gk:imageinfo"/>
<map:parameter name="exclude" value=".*\.bak"/>
</map:generate>
<map:transform src="dir2gallery.xslt" label="gallery" />
<map:transform src="gallery2frame.xslt" />
<map:transform type="i18n">
<map:parameter name="locale" value="en_US"/>
</map:transform>
<map:serialize/>
</map:match>
<!-- Gallery Listing Matcher ( Wraps a set of images in an HTML index ) -->
<map:match pattern="frame**/">
<map:generate src="{global:gallery}{1}" type="xpathdirectory">
<map:parameter name="depth" value="1"/>
<map:parameter name="sort" value="time"/>
<map:parameter name="include" value="{2}/*.(jpg|jpeg|xml)"/>
<map:parameter name="root" value="gallery$"/>
<map:parameter name="nsmapping" value="nsmappings.properties" />
<map:parameter name="nsmapping-reload" value="true" />
<map:parameter name="xpath" value="/gk:imageinfo"/>
<map:parameter name="exclude" value=".*\.bak"/>
</map:generate>
<map:transform src="dir2gallery.xslt" label="gallery" />
<map:transform src="gallery2category.xslt" label="category" />
<map:transform type="i18n">
<map:parameter name="locale" value="en_US"/>
</map:transform>
<map:serialize/>
</map:match>
<!-- Gallery Main Page - (Internal redirect into gallery handling code) -->
<map:match pattern="bwgallery.html">
<map:redirect-to uri="cocoon:/frame/newsitepiccies/" />
</map:match>
<!-- News Internal -->
<map:match pattern="inews">
<map:generate src="{global:news}" type="xpathdirectory" label="content">
<map:parameter name="sort" value="time"/>
<map:parameter name="nsmapping" value="nsmappings.properties" />
<map:parameter name="nsmapping-reload" value="true" />
<map:parameter name="xpath" value="/gk:post"/>
<map:parameter name="exclude" value=".*\.bak"/>
</map:generate>
<map:transform src="files2news.xslt" />
<map:serialize type="xml" />
</map:match>
<!-- Images Internal (Full gallery listing with imageinfo .xml files) -->
<map:match pattern="igallery">
<map:generate src="{global:gallery}" type="xpathdirectory">
<map:parameter name="depth" value="5"/>
<map:parameter name="sort" value="time"/>
<map:parameter name="nsmapping" value="nsmappings.properties" />
<map:parameter name="nsmapping-reload" value="true" />
<map:parameter name="xpath" value="/gk:imageinfo"/>
<map:parameter name="exclude" value=".*\.bak"/>
</map:generate>
<map:transform src="dir2gallery.xslt" />
<map:serialize type="xml"/>
</map:match>
<!-- Thumbnail Catalog for command line crawler-->
<map:match pattern="allthumbs.html">
<map:generate src="cocoon:/igallery" />
<map:transform src="gallery2activeThumbs.xslt" />
<map:serialize />
</map:match>
<!-- Images Recent (Order by date) -->
<map:match pattern="irecent2">
<map:generate src="cocoon:/igallery" />
<map:transform src="gallery2recent.xslt" />
<map:serialize type="xml"/>
</map:match>
<!-- Images Recent (Extract only initial 5) -->
<map:match pattern="irecent">
<map:generate src="cocoon:/irecent2" />
<map:transform src="recent2few.xslt" />
<map:serialize type="xml"/>
</map:match>
<!-- Index/Combo Page -->
<map:match pattern="index.html">
<map:aggregate element="blueweb"
prefix="gk"
ns="http://xml.grumpykitty.biz/gk/1.0"
label="agg">
<map:part src="cocoon://inews" />
<map:part src="cocoon://irecent" />
</map:aggregate>
<map:transform src="tohtml.xslt" />
<map:serialize/>
</map:match>
<!-- News Page -->
<map:match pattern="bwnews.html">
<map:aggregate element="blueweb"
prefix="gk"
ns="http://xml.grumpykitty.biz/gk/1.0"
label="agg">
<map:part src="cocoon://inews" />
</map:aggregate>
<map:transform src="news2html.xslt" />
<map:serialize/>
</map:match>
<!-- Bio Page -->
<!-- Links Page -->
<!-- Contact Page -->
<map:match pattern="bw*.html">
<map:generate src="{global:stuff}/{1}.xml" />
<map:transform src="tohtml.xslt">
<map:parameter name="cocoonVersion" value="{global:cocoonVersion}" />
<map:parameter name="copyright" value="{global:copyright}" />
<map:parameter name="date" value="{friendlyDate:date}" />
</map:transform>
<map:serialize/>
</map:match>
<!-- Error Handling Page -->
<map:handle-errors>
<map:select type="exception">
<map:when test="not-found">
<map:generate type="notifying"/>
<map:serialize type="xml" status-code="404"/>
</map:when>
<map:otherwise>
<map:generate type="notifying"/>
<map:serialize type="xml" status-code="500"/>
</map:otherwise>
</map:select>
</map:handle-errors>
</map:pipeline>
</map:pipelines>
</map:sitemap>