Subject: Re: [erlang-questions] List Question

Andrew, if you want to store the data in a format that is as compact as possible, I'd recommend storing the HL7 message itself as a binary and parsing on demand. If you want to store the data pre-parsed, then I would store them as list of segments where each segment is represented by a nested tuple. That way you can reference the fields, components, etc., by their index in an O(1) operation, and you can still easily add or remove segments from a message.

What I'm describing is similar to the intermediate format used by an HL7 parser ( I wrote for Elixir. You could probably use it as inspiration for what you need. I had also created another parser in Erlang ( that maps the segments to records, but part of it is in C using NIFs.
Let me know if you have any other doubts.

On Mon, Aug 7, 2017 at 10:46 AM, Andrew McIntyre <[email protected]> wrote:
Hello Craig,

Thanks for your help.

I am trying to store the data as efficiently as possible. Its HL7
natively and this is my test:

OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F

|~^& are delimiters. The hierarchy is only so deep and using lists of
lists to provide a tree like way to access the data eg Field 3, repeat
1 component 2 subcomponent1

Parsed it looks like this:

  "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]

As the format evolves over time the hierarchy can be extended, but
older clients can still read the value they are expecting if they
follow the rules, like reading the first value in the list when you
only expect one value to be there.

Currently a typical system might have 12 million of these records so
want to keep format as small as possible in the erlang format, hence
reluctant to tag 2 much, but know how to get value of interest. Maybe
that is my non erlang background showing up? Traversing 4 small lists
by index should be fast??

I guess I could save strings as binary in the lists then is_binary
should work?? Is that the case. I gather on 64bit system especially
binary is more space efficient.

Monday, August 7, 2017, 10:53:11 PM, you wrote:

z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>> Hello zxq9,
>> Thanks, Unfortunately I do not know the value of the string that will
>> be there. Its an extensible hierarchy that can be several lists deep -
>> or not. Might need to revise the data structure

z> In this case it can be useful to consider a way of tagging values.

z> Imagine we want to represent a directory tree structure and have a
z> descent-first traversal function recurse over it while creating the
z> tree. We have two things that can happen, there is a flat list of
z> new directories that need to be created, and there is the
z> possibility that the tree depth extends deeper at each node.

z> The naive version would look like what you have:

z> ["top_dir_1",
z>  "top_dir_2",
z>  ["next_level_1",
z>   "next_level_2"]]

z> This leaves a bit to be desired, not only because of the problem
z> you have pointed out that makes it difficult to know what is deep
z> and what is shallow, but also because we don't really have a good
z> way to represent a full tree (what would be the name of a directory containing other directories?).

z> So consider instead something like this:

z> [{"top_dir_1", []},
z>  {"top_dir_2", []},
z>  {"top_dir_3",
z>   [{"next_level_1", []},
z>    {"next_level_2", []}]}]

z> Now we have a representation of each directory's name AND its contents.

z> We can traverse this laterally AND in depth without any ambiguity
z> or need for carrying around a record of where we have been (by
z> using depth recursion and tail-call recursion):

z> make_tree([{Dir, Contents} | Rest]) ->
z>     ok =
z>         case filelib:is_dir(Dir) of
z>             true ->
z>                 ok;
z>             false ->
z>                 ok = log(info, "Creating dir: ~p", [Dir]),
z>                 file:make_dir(Dir)
z>         end,
z>     ok = file:set_cwd(Dir),
z>     ok = make_tree(Contents),
z>     ok = file:set_cwd(".."),
z>     make_tree(Rest);
make_tree([]) ->>
z>     ok.

z> Not so bad.

z> In your case we could represent things perhaps a bit better by
z> separating the types and tagging them. Instead of just "FT" and
z> whatever other string labels you might want, you could either use
z> atoms (totally unambiguous) or tuples as we have in the example
z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.

z> [{value, "foo"},
z>  {tree,
z>   [{value, "bar"},
z>    {value, "foo"}]},
z>  {value, "baz"}]

z> So then we do something like:

z> traverse([{value, Value} | Rest]) ->
z>    ok = do_thing(Value),
z>    traverse(Rest);
z> traverse([{tree, Contents} | Rest]) ->
z>    ok = traverse(Contents),
z>    traverse(Rest);
traverse([]) ->>
z>    ok.

z> Anyway, don't be afraid of varying your value types to say exactly
z> what you mean. If your strings like "FT" only had meaning within
z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:

z> [foo,
z>  bar,
z>  [foo,
z>   bar],
z>  foo]

z> So then we can do:

z> traverse([foo | Rest]) ->
z>     ok = do_foo(),
z>     traverse(Rest);
z> traverse([bar | Rest]) ->
z>     ok = do_bar(),
z>     traverse(Rest);
z> traverse([Value | Rest]) when is_list(Value) ->
z>     ok = traverse(Value),
z>     traverse(Rest);
traverse([]) ->>
z>     ok.

z> And of course, you can not use a guard if you want to match on a
z> list shape in the listy clause there, but that is a minor detail.
z> The point is to make your data types MEAN SOMETHING REASONABLE
z> within your system. Use atoms when your values are meaningful only
z> within your system. Strings are for the birds.

z> -Craig
z> _______________________________________________
z> erlang-questions mailing list
z> [email protected]

Best regards,
 Andrew                             mailto:[email protected]

sent from a real computer

erlang-questions mailing list
[email protected]

erlang-questions mailing list
[email protected]

Programming list archiving by: Enterprise Git Hosting