Calling ARM/XScale assembly subroutine from c++?

Because PB5 does not support inline assembly for ARM/XScale, I want to write
an ARM assembly subroutine, compile it by armasm.exe (shipped along with
PB5), and call it from within a PB5's C++ project.

Is there any guide of how to call an ARM assembly routine from C++ code in
PB5? I need information on how to export/import functions, argument passing,
and the like.

