Problems transfering 'text/html' from Java to native applications

Introduction

Try to copy some text as html into the windows clipboard with jdk1.3/1.4 (others not tested).

Some other applications don't accept this as valid data. Known as:
not working: Eudora 5.1 and OpenOfficeWriter 1.0.1.
working: OpenOfficeCalc 1.0.1, also Word2000, Dreamweaver4 and XMLSpy4 are told me to work.

If you put data into the clipboard with a RepresentationClass other than java.lang.String doesn't/shouldn't work with any application.

Windows html clipboard format

Html as a clipboard format is registered in Windows with the identifier 'HTML Format'. The data-format is text, something like this:

Version:1.0
StartHTML:0000000000
EndHTML:0000000000
StartFragment:0000000000
EndFragment:0000000000
<!--StartFragment--> 
... 
<!-- EndFragment-- >

  • '...' is the html-data you want to put into the clipboard.
  • The numbers '0000000000' are byte-offsets from the start of the whole block, StartHTML & EndHTML may be -1 (see links) and java sets them to -1.
  • All data has to be in the utf-8 charset.
  • For more options and details see the links

Links
HTML Clipboard Format , or here: nearly the same
274326 - HOWTO: Add HTML Code to the Clipboard by Using Visual Basic, Sun look here!

Clipboard inspecting tool

programm preview I have written a small native (win) application to inspect the clipboard in a nearly raw way. Get it

Bugs in other applications

A little test helps to find out what is wrong with some destination applications. They can't handle StartHTML & EndHTML set to -1. If you set them to real values all works fine. Oddly enough Eudora doesn't understand its own Html Format.

Bug in Java

If you use a RepresentationClass other than java.lang.String the html-data is wrapped twice with the envelope, so applications see something odd.
Example result:

Version:0.9
StartHTML:-1
EndHTML:-1
StartFragment:0000000111
EndFragment:0000000417
<!--StartFragment-->
Version:0.9
StartHTML:-1
EndHTML:-1
StartFragment:0000000111
EndFragment:0000000286
<!--StartFragment-->
Version:0.9
StartHTML:00000000108
EndHTML:00000000168
StartFragment:00000000108
EndFragment:00000000168
<!--StartFragment--><b>Example text</b><!--EndFragment-->

Reason: If you look at the code below you can see that first super.translateTransferable() is called, afterwards HTMLSupport.convertToHTMLFormat(). The bug is in the superclass DataTransferer when, if the RepresentationClass isn't String, the data is converted to String and recursively translateTransferable() is called. After the second return the data is wrapped two times.

Java - Workaround

First, use only String as RepresentationClass to put html into the clipboard to avoid the double wrapper.

For the interaction with other programs, i don't know a clean one. Since some of the important methods in WClipboard are private, you have to get access with reflection, and become really dependant from the underlying implementation, see below.

It probably works to exchange the class HTMLSupport with bootclasspath (see 'java.exe -X'), you only want this for your own computer. Or try to use your own classloader to inject a new class.

SystemFlavorMap - mapping of native/DataFlavor

I tried to overwrite the default mapping of native/DataFlavor by playing around with:

SystemFlavorMap.getDefaultFlavorMap()
SystemFlavorMap.setNativesForFlavor(...)
SystemFlavorMap.addUnencodedNativeForFlavor(...)

but this doesn't help.
A reason is in sun.awt.windows.WDataTransferer here the snippet:

public byte[] translateTransferable( Transferable transferable, 
                                     DataFlavor dataflavor, 
                                     long nativeFormat) throws IOException{

   byte data[] = super.translateTransferable( transferable, 
                                              dataflavor, 
                                              nativeFormat);
   if(nativeFormat == CF_HTML) data = HTMLSupport.convertToHTMLFormat(data);
   return data;
   }

They always compare to the native format, so it is always translated in the standard way. Even if one uses a different DataFlavor, e.g. 'text/myOwn' which i may map to the native 'Html Format'. Using a different native Format is possible, but then other applications doesn't know about this.
Another disadvantage of the 'default enveloping' is, that one can't use the advanced features of the windows-clipboardformat by placing some info into the header (see Links above):

  • Offer a selection in the context of a whole page
  • Extra info like source URL

Workaround to interact with buggy software

Access the native clipboard, windows only, write only. Needs work to become fool-proof, you may look into WClipboard. Since this class hacks a private method, it may be harmful if the underlying implementation is changed.
Also, some knowledge about the native format is needed, e.g. plain text needs a null char at the end.

To get access to the native windows clipboard: PEBClip.java
Transform html to Windows clipboard format: PEBHtmlSupport.java
Test class: Test.java

© July 2003 Peter Büttner