Since I tend to forget these things, here’s a little tutorial how to compile the Texas Instruments CMEM and SDMA kernel-modules for the beagleboard. I don’t like the codec-engine build process, therefore I’ll compile the kernels by hand.
So what’s CMEM all about?
In a nutshell CMEM is a kernel-module that allows you to allocate contiguous memory on the OMAP3, map this memory it into the address-space of a user-mode program so you can read and write to it.
CMEM also gives you the physical address of these memory-blocks.
This is important if you want to share some memory with the C64x+ DSP as the DSP has no idea what the memory manager of the Cortex-A8 is doing. It also allows linux user-mode programs to allocate memory that can be used with DMA.
Things you need:
- The sources of the libutils from the TI website (registration is required but free). I’ve used release 2.24 which works fine with my 2.6.29-omap1 kernel image.
- The linux kernel-sources for the beagleboard. If you use OpenEmbedded and you have already compiled an image you’ll most likey find them at $OE_HOME/tmp/staging/beagleboard-angstrom-linux-gnueabi/kernel/.
- A cross-compiler toolchain for ARM. I still use the CodeSourcery 2007q3 light release. Works for me.
- A beagleboard. Also not strictly required it makes perfect sense to have one.
Howto compile CMEM:
- Untar the linuxutils package. The place where to untar them is not important.
- Go into the CMEM subfolder. For the 2.24 release it’s the ./packages/ti/sdo/linuxutils/cmem/ folder.
- Take a look at the Rules.make file. Messy, ain’t it? Remove the write protection.. chmod +w Rules.make will do that. You now have to adjust the pathes in that file or if you’re like me – delete it and write it from scratch:Here is my copy with everything not needed removed:
# path to your toolchain. Yes, you need to set it twice (don't ask...) MVTOOL_PREFIX=/opt/CodeSourcery/bin/arm-none-linux-gnueabi- UCTOOL_PREFIX=/opt/CodeSourcery/bin/arm-none-linux-gnueabi- # path to the kernel-sources: LINUXKERNEL_INSTALL_DIR=${OE_HOME}/tmp/staging/beagleboard-angstrom-linux-gnueabi/kernel # some config things: USE_UDEV=1 MAX_POOLS=128
- That’s it.. If all pathes are correct “make release” should build the kernel module and some test applications.
Howto test CMEM:
- Copy the kernel-module to the beagleboard. For the test I’ve just copied it into /home/root/. You’ll find the kernel-module at ./src/module/cmemk.ko
- On the board, check your U-Boot boot-parameters. Since CMEM manages physical memory you have to restrict the amount of memory managed by linux. To put aside some memory add the mem=80M directive to the bootargs. You can of course use a different setting if you want to, but the following examples assume 80M for the linux-kernel and the rest for DSP and CMEM.
- Boot the beagle and login as root.
- Load the kernel-module. Let’s keep things simple. We create a single 16mb memory pool. To do so load the module like this:
/sbin/insmod cmemk.ko pools=1x1000000 phys_start=0x85000000 phys_end=0x86000000
If everything worked as expected you’ll find the following line in the kernel-log (type dmesg to get it):
cmem initialized 1 pools between 0x85000000 and 0x86000000
If not – well – CMEM will give you a bunch of hints in the kernel-log if it had problems during initialization. Most likely you’ve got the addresses wrong. As the start-address you should pass 0×80000000 plus the size you’ve specified in the u-boot bootargs. Add the sizes of all of your CMEM-pools and use this as the end address.
- While the module is loaded you’ll find a file under /proc/cmem with some statistics.
- If everything worked so far you can run some of the demo-applications like apitest. They’re are located in the ./apps/apitest/ folders.
Compile an ARM program that uses CMEM:
This is easy. Copy ./src/interface/cmem.h to a place where the cross-compiler will find it and add one of the cmem.a libraries to your project. Since I like to keep things simple I’ve just added the interface source to my project. It’s ./src/interface/cmem.c.
Now you can allocate contiguous memory and get the physical address of it. Big deal, eh? Honestly, like I said CMEM only makes sense if you want to make use of the C64x+ DSP or the SDMA of the OMAP3.
Hi,
I’m trying to use CMEM to do image processing (grayscale conversion) in the DSP of the OMAP3530. My setup is as follow:
-pools created with CMEM: pools=2×4001792,10×4096 phys_start=0×9ba00000 phys_end=0×9c1ac000 (the 2 big ones are for images)
On the GPP side:
-initialization of CMEM with CMEM_init() and dsp_mmu_map_cmem () (the function you provide for the MMU problems)
-images allocated via CMEM_alloc (which automatically choose the big pools)
-physical addresses of images are get with CMEM_getPhys and sent to the DSP with MSGQ_put
-wait answer from the DSP
On the DSP side:
-receive addresses with MSGQ_get
-do the grayscale conversion from the color image buffer to the grayscale buffer
-send a done message to GPP with MSGQ_put
The problem is that the conversion is very very long. So I have removed the conversion and just put a basic copy (all red data from color image is copied to the grayscale image), and the problem is always the same. The same copy on the arm is about 40 times faster…
Do you have any idea of what I am doing wrong?
Thanks for your help.
Comment by Guillaume — February 4, 2010 @ 11:31 pm
Hi Guillaume,
First off, congratulations to get all the dsp-stuff working! I know it’s not easy.
Regarding your performance problems: Could you do some measurements please? I’m interested in how many megabytes per second you can process. I need the reads and writes to your image data arrays and I need to know if you do byte or larger accesses.
Also a look at the source-code of your conversion-function may be useful. My email address is on the “Impressum”-page.
I’m sure we can find out what’s wrong with your code…
Cheers,
Nils
Comment by Nils — February 5, 2010 @ 12:56 am
Thanks, your right it’s not so easy to have all this working; but I think I still have a lot of work to achieve good performances…
First, I have to clarify my last comment. I realized that the copy on the ARM was faster because it was not working on a buffer allocated with CMEM. If I made a copy on a buffer allocated with CMEM, DSP is a little faster.
So I think the problem comes from CMEM, and I started to work only on the GPP side. So the results are the followings (byte access):
-100MByte write on a CMEM buffer takes 15 seconds and 204194 microseconds (6.577 MByte/s)
-100MByte write on a classical buffer takes 1 seconds and 910461 microseconds (52,343 MByte/s)
Using CMEM buffer is very slow… Now I have to discover how to improve this…
Regards,
Guillaume
Comment by Guillaume — February 5, 2010 @ 12:39 pm
Hi Guillaume,
What you’re seeing here is the speed of memory with caches disabled. You can enable the caches using the allocation-parameters of the CMEM_alloc call. Once you’ve done it you’ll see the same speed as heap memory.
Once you’ve enabled your caches, make sure that all data has been written back from the cache before you pass a pointer to the DSP. Before you read data that has been written on the DSP, invalidate the memory region. And you have to do the same on the DSP as well, just with changed roles.
Cache manipulation functions for the ARM are in the cmem.h header-file. On the DSP you’ll find them in bcache.h (part of DspBios).
Cheers,
Nils
Comment by Nils — February 5, 2010 @ 6:27 pm
Hi Nils,
Did you tried CMEM from linuxutils 2.25? It seems to have problems with the ioctl. CMEM_cacheWb function is not working properly for me… On linuxutils 2.24 it works perfectly well. The change log says:
“While this release fixes a few bugs (see immediately below), the biggest change is with the ioctl() command IDs – they have changed to incorporate a module-specific “magic” code so as to not conflict with other device driver ioctl() command IDs. This fact means that it is critical that the user update the sub-module user libraries in conjunction with their associated kernel modules, although a mismatch between kernel module and user library will be reported during the module’s “init” function (and the “init” will fail). ”
Regards,
Guillaume
Comment by Guillaume — March 1, 2010 @ 11:17 am
Guillaume, I would be very interested to see the new performance results with the cache/-sync issue fixed, is it possible you will come back and share your progress with us on this blog? I’m also constructing a similar DSP rig as you (audio, not image) and your travel log, so to speak, is of immense interest!
Comment by Jay Vaughan — July 15, 2010 @ 1:17 pm