Re: Can I help?
Glad you contacted us. Would be great to have another contributor.
Your insights are correct - DFDL evolved driven by standardization of existing data integration tool capabilities, which were focused on files that you would probably call "data set files", as opposed to image, document, archive, etc. file formats. We didn't have examples of data integration systems/tools that did things like file-offsets, so things like the index structure in zip you described, were beyond the state of the art for declarative description.
But now we have Daffodil open-source, which is the perfect vehicle for prototyping and creating features enabling these sorts of data descriptions, and subsequently then, based on that experience, we can feed ideas back to the DFDL standard for incorporation into a future version of the standard.
As for specifics of how you can start to contribute, longer discussion.
We have ambition to label the JIRA tickets for Daffodil to identify good first projects for new contributors, .... The whole pool of JIRA tickets, which are our project-wide TODO list, is at
There are 450+ tickets open, so there is lots to work on.
I'll leave it at that for this message.
From: Russ Williams <russ@xxxxxxxxxxxxxxx>
Sent: Wednesday, May 23, 2018 5:54:14 AM
Subject: Can I help?
I stumbled across DFDL / Apache Daffodil yesterday while looking for a way to specify file formats in a machine-readable form. I was surprised that there hasn’t been a lot more effort in this space, given the importance for archival, and I’m very keen to see the project succeed. I’ve not done much Open Source work before, but I’m a commercial software engineer/architect with over two decades’ experience, so hopefully I could be of some use.
I’m particularly interested in the handling of large binary files, which I see from the wiki and JIRA (e.g. DAFFODIL-1735) is a key area of concern for you guys as well, but I’m a little concerned that the DFDL 1.0 spec seems to have been written with some XML-like assumptions of how parsing should work, rather than how various binary formats are actually parsed (e.g. ZIP, with a signature at the start, then an index at the end, doesn’t seem to fit the document model).
Are you looking for people to get involved with Daffodil? Is there anything I can help with to get started?