From chris at sencjw.com Sat Dec 1 21:22:53 2018 From: chris at sencjw.com (Chris Wilson) Date: Sat, 1 Dec 2018 15:22:53 -0600 Subject: Databases and the specter of scaling Message-ID: <20181201212253.GA6794@sencjw.com> Hey Friends, I thought that this might be a practical question best answered by this group. It goes like this: I'm seen as the local "hey this guy *really* likes databases guy" around work and so I'm often fielding questions about how to write or improve queries. One thing that often comes up is that people will hesitate on writing the obvious query for fear that it will become slow as the size of the data grows. I don't have the code on hand, but I'll sketch it out: orders: - id - code - ...other stuff?... items: - id - product - price - order_id (FK orders(id)) And you've probably already guessed it, but the question is how to write "order total." To me at least, I'd write the most straightforward query I can think of: select sum(price) from orders join items on orders.id = items.order_id where orders.id = 1; My coworker tested it at least up to orders with 5,000 items and it was still having execution times in the handful of milliseconds range (as I'd expected). So my overall question is how to quell the general suspicion (?) or nebulous concerns that lots of devs have with regards to PORDBMSs (plain old relational dbs)? When do you think issues like the above take on real wight? I know the true answer is "it depends" but I always like to ask -- what do we know, while ruling out "it depends?" Thanks! -- Chris From joe at begriffs.com Sat Dec 1 23:23:08 2018 From: joe at begriffs.com (Joe Nelson) Date: Sat, 1 Dec 2018 17:23:08 -0600 Subject: Databases and the specter of scaling In-Reply-To: <20181201212253.GA6794@sencjw.com> References: <20181201212253.GA6794@sencjw.com> Message-ID: <20181201232308.GA14484@begriffs.com> Chris Wilson wrote: > orders: > - id > - code > - ...other stuff?... > > items: > - id > - product > - price > - order_id (FK orders(id)) > > And you've probably already guessed it, but the question is how to write > "order total." I know it's just an example, but I'd model it slightly differently, with a line_items join table between orders and items so that the relationship can be many-to-many. Also I'll call items "products." create table products ( product_id bigserial primary key, descrip text, price numeric(6, 2) ); create table orders ( order_id bigserial primary key, product_id bigint references products, ordered_at timestamptz default now() ); create index order_product on orders (product_id); create table line_items ( line_item_id bigserial primary key, qty integer, product_id bigint references products, order_id bigint references orders ); create index li_product on line_items (product_id); create index li_order on line_items (order_id); -- 100 random products insert into products (descrip, price) select md5(i::text), random()*100.0 from generate_series(1,100) as i; -- for each product, 100 orders insert into orders (product_id) select i%100+1 from generate_series(0,100*100-1) as i; -- for each order, 10 line items insert into line_items (product_id, order_id, qty) select i%100+1, i%(100*100)+1, round(random()*4)+1 from generate_series(0,100*100*10-1) as i; > So my overall question is how to quell the general suspicion (?) or > nebulous concerns that lots of devs have with regards to PORDBMSs (plain > old relational dbs)? Your example is so solidly within the relational db wheelhouse that I'm trying to imagine who you work with who is even scared by this. RDBMSes like Postgres have a sophisticated query planning engine that outstrips whatever procedural drudgery a developer would need for e.g. a document db like Mongo. > When do you think issues like the above take on real wight? I know the > true answer is "it depends" but I always like to ask -- what do we > know, while ruling out "it depends?" The key is to ask the database what its query plan is for the query in question and then tune any bottlenecks. Because I included indices on columns used in the join, the plan is efficient. (The primary keys create btree indices too.) We can use the explain command to learn about a query. How about the query to find the total spent on a product across all orders. explain select sum(qty*price) from line_items join products using (product_id) where line_items.product_id = 1; QUERY PLAN ----------------------------------------------------------------------------------------- Aggregate (cost=835.27..835.28 rows=1 width=32) -> Nested Loop (cost=28.92..826.92 rows=1113 width=10) -> Seq Scan on products (cost=0.00..2.25 rows=1 width=14) Filter: (product_id = 1) -> Bitmap Heap Scan on line_items (cost=28.92..813.54 rows=1113 width=12) Recheck Cond: (product_id = 1) -> Bitmap Index Scan on li_product (cost=0.00..28.64 rows=1113 width=0) Index Cond: (product_id = 1) This is hitting the indices and using a bitmap heap scan for the join. When I run the query without explain (and with \timing on in psql) it shows 2.689 ms. Relational databases eat this stuff for breakfast. If you hit a point where there are so many queries happening that even this kind of stuff goes slow, then you can partition your data in a multi-tenant scenario, where say each storefront goes on its own machine, and queries are routed to machines based on e.g. a store_id filter in the queries. That's one of the things that Citus does: https://docs.citusdata.com/en/v8.0/use_cases/multi_tenant.html I don't know the scale you'll be operating under, but I'm guessing that a single well tuned Postgres server on good hardware can handle your needs. (Plus setting up a hot standby server with postgres logical replication for immediate failover if the main db goes down.) From joe at begriffs.com Sun Dec 2 00:22:16 2018 From: joe at begriffs.com (Joe Nelson) Date: Sat, 1 Dec 2018 18:22:16 -0600 Subject: Databases and the specter of scaling In-Reply-To: <20181201232308.GA14484@begriffs.com> References: <20181201212253.GA6794@sencjw.com> <20181201232308.GA14484@begriffs.com> Message-ID: <20181202002216.GB14484@begriffs.com> > When do you think issues like the above take on real wight? I know the > true answer is "it depends" but I always like to ask -- what do we > know, while ruling out "it depends?" Another note: to evaluate a database for your application you'll want to determine what your workload will look like and pick a benchmark to mimic it. The Transaction Processing Performance Council has released a number of benchmarks used to compare database performance. https://www.tpc.org/information/benchmarks.asp You can run benchmarks on Postgres with pgbench. By default pgbench runs a benchmark based on TPC-B. You can also feed it customized statements of your own to test in parallel. https://www.postgresql.org/docs/current/pgbench.html From salo at saloits.com Sun Dec 2 06:20:42 2018 From: salo at saloits.com (Timothy J. Salo) Date: Sun, 2 Dec 2018 00:20:42 -0600 Subject: Databases and the specter of scaling In-Reply-To: <20181201212253.GA6794@sencjw.com> References: <20181201212253.GA6794@sencjw.com> Message-ID: <2f0e543b-027d-7dbb-b50e-3062fad5511b@saloits.com> > One thing that often comes up is that people will hesitate on > writing the obvious query for fear that it will become slow as the size > of the data grows. I'm not a database guy, but I understand that the performance of a query is strongly influenced by how a query relates to the structure of the database being queried. In particular, a query will perform much better, and scale much better, if the database can access data using an index. If, on the other hand, the database doesn't have an index for the data being queried, the database may have to search all of the rows of the database, (with the expected effect on performance). You can add indexes of a database to improve the performance of queries that operate on those columns. Rather than try to explain what database indices are, I suggest that you look at, for example, the Wiki page for "database index". You might also Google [database index]. Everyone, in my opinion, should take a database class, or two. And, an algorithms class or two or three. I suppose that it is worth asserting that there is more to databases than just SQL: you should really understand how database management systems operate under the covers. And, maybe a smidgen about relational algebra. Having said that, you have just about exhausted my knowledge of databases. From salo at saloits.com Sun Dec 2 06:33:06 2018 From: salo at saloits.com (Timothy J. Salo) Date: Sun, 2 Dec 2018 00:33:06 -0600 Subject: Databases and the specter of scaling In-Reply-To: <2f0e543b-027d-7dbb-b50e-3062fad5511b@saloits.com> References: <20181201212253.GA6794@sencjw.com> <2f0e543b-027d-7dbb-b50e-3062fad5511b@saloits.com> Message-ID: <797c0145-57b4-36df-14ea-67b6ef64c216@saloits.com> > Everyone, in my opinion, should take a database class, or two.? And, an > algorithms class or two or three. In the interim, if one is really interested in this topic, one might consider buying a used database textbook. Specifically, a database textbook, not a book on SQL, and not a book on how to administer a database, (although you might find both useful). I am talking about a textbook that talks about database technology, the stuff that goes on under the covers of a database, and some of the theory on which these technologies are build. Or, you might be able to find a textbook online. No, I don't have any recommendations. But, you might start by looking at a used copy of an older edition of whatever the UofM is using in its computer science database course. If one is on campus, one could also try asking someone who is teaching a database course. They might be able to recommend used textbooks. Or, if you are really lucky, they might have or know of a free copy of a textbook. -tjs From nompelis at nobelware.com Sun Dec 2 21:04:27 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Sun, 2 Dec 2018 21:04:27 +0000 Subject: Databases and the specter of scaling In-Reply-To: <797c0145-57b4-36df-14ea-67b6ef64c216@saloits.com> References: <20181201212253.GA6794@sencjw.com> <2f0e543b-027d-7dbb-b50e-3062fad5511b@saloits.com> <797c0145-57b4-36df-14ea-67b6ef64c216@saloits.com> Message-ID: <20181202210427.GA28842@nobelware.com> I am with Tim on most of this; there is a lot to learn by simply studying the very essentials. Having said that, I have never studied databases (never had to or had a strong interest), but I feel intrigued right now... especially after I read Joe's highly technical response! But I want to take this in another, tangent, direction. I do not know if it is my training and experience in HPC, the algorithms that are involved in numerical simulation and HPC, or what, but when I see things that involve potentially O(N) operations of any kind, I immediately look for scalable solutions. But that is not to say I do not "try" things the naive way at first, I just do it on a small N that is representative. So, having said that, I will just offer this piece of advise: perform the query, but limit the results to the "first few" or limit the search space to a "known to be small" size. This gives you a baseline. Yeah, it is the obvious and a non-answer, but you get the idea. Never fear the crash-test in the sandbox; learn to love it. Not sure if this helps, but it is what I do. From nompelis at nobelware.com Sun Dec 2 21:07:59 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Sun, 2 Dec 2018 21:07:59 +0000 Subject: Databases and the specter of scaling In-Reply-To: <20181202210427.GA28842@nobelware.com> References: <20181201212253.GA6794@sencjw.com> <2f0e543b-027d-7dbb-b50e-3062fad5511b@saloits.com> <797c0145-57b4-36df-14ea-67b6ef64c216@saloits.com> <20181202210427.GA28842@nobelware.com> Message-ID: <20181202210759.GA29549@nobelware.com> Also, I nominate Joe to pair with Chris on this, and write the weblog post for us to read! From joe at begriffs.com Sun Dec 2 22:17:07 2018 From: joe at begriffs.com (Joe Nelson) Date: Sun, 2 Dec 2018 16:17:07 -0600 Subject: Databases and the specter of scaling In-Reply-To: <20181202210759.GA29549@nobelware.com> References: <20181201212253.GA6794@sencjw.com> <2f0e543b-027d-7dbb-b50e-3062fad5511b@saloits.com> <797c0145-57b4-36df-14ea-67b6ef64c216@saloits.com> <20181202210427.GA28842@nobelware.com> <20181202210759.GA29549@nobelware.com> Message-ID: <20181202221707.GA78596@begriffs.com> > Also, I nominate Joe to pair with Chris on this, and write the weblog > post for us to read! As far as opportunities to learn, I would like to fully understand the output from the EXPLAIN command. This would be a good place to start: https://www.postgresql.org/docs/current/using-explain.html I know some things about it in general, but ideally I should understand every node type, its performance characteristics, why it was chosen, and how to adjust the system or query to get a different result. That would be a worthwhile blog post. Anyone want to pair on doing some experiments? From joe at begriffs.com Sat Dec 15 05:19:47 2018 From: joe at begriffs.com (Joe Nelson) Date: Fri, 14 Dec 2018 23:19:47 -0600 Subject: Who wants to help me profile my simple program? Message-ID: <20181215051947.GA34468@begriffs.com> In order to learn more about the usage and performance of C's standard I/O library I created this project: https://github.com//begriffs/randln Its function is simple, just print a random line from a file. (Kind of like the "fortune" command.) It can actually accomplish this in several ways. The readme lists each way it can use to find a random line, along with the pros and cons I've discovered. There are three things I want to do, hopefully with one of you. Anyone want to pair program with me? 1. Add proper error checking. Load the program in the debugger, pause it at various lines then in another terminal mess with the file it's reading to see what kind of interesting errors we can produce. 2. Are there other line-finding methods using portable C? Are there better methods available using POSIX? 3. Profile the current line-finding methods and see how they compare with one another, and try to make them faster. I've already made some fun experiments with profiling. One way is to use gprof to examine how often the functions get called. Another way is actually watching the I/O done by the kernel using ktrace. First I start the debugger and enable tracing for its I/O and that of its children. ktrace -i -ti gdb randln I put a breakpoint in my line-finding function and start the program. In another terminal I find the PID of randln launched by gdb, and start tailing the ktrace dump, showing the first snippet of each block read from file descriptors: kdump -m 20 -l -p Now as I step through the debugger I can see the exact consequences of what the program is doing. Really fun stuff. If you want to pair with me we could either meet in person or SSH into our shared group machine and work there. -- Joe Nelson https://begriffs.com From nompelis at nobelware.com Sat Dec 15 19:30:16 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Sat, 15 Dec 2018 19:30:16 +0000 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181215051947.GA34468@begriffs.com> References: <20181215051947.GA34468@begriffs.com> Message-ID: <20181215193016.GA10219@nobelware.com> I have done profiling before, but not with gprof, and it was useful in some ways. But I was only interested in raw computational efficiency for numerical work, and never concerned with raw I/O. Two things I have to say. 1. I think all of what youa re doing is "buffered I/O" which can have some potential disadvantages should one try to optimize on speed. Un-buffered I/O should also be considered, and you do have to check for '\n' just as you do to detect line-breaks. 2. Run all this on a ramdrive and see how that works. You can use /dev/shm (under Linux, for sure) as your sandbox, even if you are reading from a file that is on a physical drive. On that realm, had I the time to play on this with you, I would totally go crazy on the media... Drop a file in /dev/shm that is to become a "block device". Mount it via loopback at a mount-point: 'mount ...... ..... -o loop' as usual. Drop the text-file in that directory and run the tests there to see what I/O activity the kernel is trapping. From chris at sencjw.com Sat Dec 15 19:44:10 2018 From: chris at sencjw.com (Chris Wilson) Date: Sat, 15 Dec 2018 13:44:10 -0600 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181215051947.GA34468@begriffs.com> References: <20181215051947.GA34468@begriffs.com> Message-ID: > On Dec 14, 2018, at 11:19 PM, Joe Nelson wrote: > > There are three things I want to do, hopefully with one of you. Anyone > want to pair program with me? Ooh, ooh. Pick me! When are you available, this weekend? ? Chris From joe at begriffs.com Sun Dec 16 03:48:00 2018 From: joe at begriffs.com (Joe Nelson) Date: Sat, 15 Dec 2018 21:48:00 -0600 Subject: Who wants to help me profile my simple program? In-Reply-To: References: <20181215051947.GA34468@begriffs.com> Message-ID: <20181216034800.GB34468@begriffs.com> > Ooh, ooh. Pick me! When are you available, this weekend? Sounds great, I'll give you a call on Sunday. I was busy with some family stuff today. From joe at begriffs.com Sun Dec 16 03:54:37 2018 From: joe at begriffs.com (Joe Nelson) Date: Sat, 15 Dec 2018 21:54:37 -0600 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181215193016.GA10219@nobelware.com> References: <20181215051947.GA34468@begriffs.com> <20181215193016.GA10219@nobelware.com> Message-ID: <20181216035437.GC34468@begriffs.com> > 1. I think all of what youa re doing is "buffered I/O" Right, by default the C standard library chooses block buffering for file access, line buffering for the terminal, and unbuffered writes to stderr. Might be interesting to see whether I could use setvbuf() to work more efficiently for this situation. Which mode and/or buffer size do you suspect would help? > 2. Run all this on a ramdrive and see how that works. Do you think that would reveal something new about the program performance, or are you just curious to see what kernel calls get traced? From nompelis at nobelware.com Sun Dec 16 16:11:02 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Sun, 16 Dec 2018 16:11:02 +0000 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181216035437.GC34468@begriffs.com> References: <20181215051947.GA34468@begriffs.com> <20181215193016.GA10219@nobelware.com> <20181216035437.GC34468@begriffs.com> Message-ID: <20181216161102.GA5551@nobelware.com> Use a relatively small buffer; something that is larger than the largest line you expect to find in the file, of course, to increase efficiency, but keep changing it and observe the statistics that come out. I'd go for two tests, unbuffered and fully buffered with the specified buffer I mentioned. Line buffering would be ambiguous enough for a performance test to not be interesting or make the test conclusive. #MeThinks > > > 2. Run all this on a ramdrive and see how that works. > > Do you think that would reveal something new about the program > performance, or are you just curious to see what kernel calls get > traced? Both. Any performance metric that does not involve the hardware, by design should not be adversely affected by the hardware. What I am saying is if you are gauging performance of the kernel I/O functions or of the stdlib, you do not want to have "spikes" of long times be thrown at you from the low-level hardware operations. But if you let the kernel do what it wants, you will notice that for sufficiently small files queried, everything will appear as if it is coming out of the memory. This is because the kernel will swallow the whole file in the "buffers" and you will be effectively skipping the hardware. I was monitoring performance of I/O a few years ago on a supercomputer and noticed that I had very erratic and unrealistic read times. As it turns out, when I started to monitor the memory usage, I noticed that the buffers filled up the whole free memory of the system and repeated reads of a 32GB ASCII file on one process were zapping fast compared to the first time. So, I am not sure if the kernel will actually allow for entirely unbuffered I/O. This is something I never went after because I started using a lot of raw unbuffered I/O through HDF5. From nompelis at nobelware.com Mon Dec 17 14:10:01 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Mon, 17 Dec 2018 14:10:01 +0000 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181216161102.GA5551@nobelware.com> References: <20181215051947.GA34468@begriffs.com> <20181215193016.GA10219@nobelware.com> <20181216035437.GC34468@begriffs.com> <20181216161102.GA5551@nobelware.com> Message-ID: <20181217141001.GA21812@nobelware.com> I see you guys spent 3 hours yesterday hacking around. How was the remote connection pairning experience? Did you voice-chat too? What did you end up learning? From joe at begriffs.com Tue Dec 18 03:07:47 2018 From: joe at begriffs.com (Joe Nelson) Date: Mon, 17 Dec 2018 21:07:47 -0600 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181217141001.GA21812@nobelware.com> References: <20181215051947.GA34468@begriffs.com> <20181215193016.GA10219@nobelware.com> <20181216035437.GC34468@begriffs.com> <20181216161102.GA5551@nobelware.com> <20181217141001.GA21812@nobelware.com> Message-ID: <20181218030747.GA27726@begriffs.com> Ioannis Nompelis wrote: > I see you guys spent 3 hours yesterday hacking around. Yeah we did -- how did you find out about the login sessions btw? Unix must keep a log somewhere? > How was the remote connection pairning experience? Did you voice-chat too? It was great! We shared tmux. Had to use the `-S` option to point to a socket in /tmp which I gave 777 access so Chris could access it too. I have a nice tmux configuration in my home folder that enables us to even use our mice to select/resize panels over SSH somehow, which is a bit magical. We would have used the murmur server on nobelware except I had already called Chris using a SIP -> PSTN trunk so my phone call was already using my headphones and mic, and he was using headphones with his smartphone so we didn't have to change anything. > What did you end up learning? A lot. One of the more dumb things is that by virtue of naming a variable "threshold," our reasoning was subliminally guided to making a bug. We were comparing whether a value was above the threshold. After seeing the problem we found that we ought to be checking if a value was *below* the threshold so we named it "limbo" instead (for a limbo bar) and it somehow made things easier to reason about. :) Other than that, I learned that rand() sucks, and about ways I should be gathering entropy for srand(). Because this program is supposed to be super portable (C89, no reliance on POSIX or /dev/random or anything), there are not a lot of changing values we have access to. Some fascinating comments in this Reddit thread where I asked for help: https://www.reddit.com/r/C_Programming/comments/a6x93c/code_review_four_ways_to_print_a_random_line_from/ Also, there's apparently a single-pass algorithm to pick a random line from a stream with a uniform distribution! I have a function that does it with a Poisson distribution, but it can be improved and people use this brainteaser for interviews apparently. From nompelis at nobelware.com Tue Dec 18 15:22:45 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Tue, 18 Dec 2018 15:22:45 +0000 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181218030747.GA27726@begriffs.com> References: <20181215051947.GA34468@begriffs.com> <20181215193016.GA10219@nobelware.com> <20181216035437.GC34468@begriffs.com> <20181216161102.GA5551@nobelware.com> <20181217141001.GA21812@nobelware.com> <20181218030747.GA27726@begriffs.com> Message-ID: <20181218152245.GA12257@nobelware.com> Sounds great! Wish I had the time to be part of this, or an observer even. > Yeah we did -- how did you find out about the login sessions btw? Unix > must keep a log somewhere? > I did a 'last | more' to get the contents of the /var/log/wtmp and saw you both log in at 18:xx and stay there for 3+ hours each. > > How was the remote connection pairning experience? Did you voice-chat too? > > It was great! We shared tmux. Had to use the `-S` option to point to a > socket in /tmp which I gave 777 access so Chris could access it too. > I have a nice tmux configuration in my home folder that enables us to > even use our mice to select/resize panels over SSH somehow, which is a > bit magical. > I have used tmux before. I did not have to play any games like that, but I do not remember much from that stunt. I did not know about the mice sharing and would appreciate a link to info (we leverage on eachother's knowledge). Will check out the reddit later. Please send links for everything else; I want to see more of the ideas behind the probabilities behind the "line drawing" in the context of it being programmed. Did not know about the brainteaser for interviews. Will most certainly name a variable "limbo_bar" in honour of your efforts! From nompelis at nobelware.com Tue Dec 18 15:30:49 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Tue, 18 Dec 2018 15:30:49 +0000 Subject: our website Message-ID: <20181218153049.GA12909@nobelware.com> I had been thinking about this for a while, and Dave, Joe and I had been discussing this: I would like to beautify our website on frostbyte. I wanted to get rid of the hideous white background (which draws a lot of power and fatigues my eyes) first. I also wanted to get a green or orange retro terminal look on it. And, of course, I wanted some content and quick-links. One of the concerns is "do our users want to be disclosing of the affiliation and do they want their websites/weblogs linked?" We can talk about that if you like. As a starting point, I offer some hand-written CSS as a template. I took the courtesy to add some of our users' pages on it. I welcome ideas. But more importantly, if anyone wants to have a take at it, incorporating our mailing list archives, wiki, adding more content, etc, please come forward! Temporary link: https://nobelware.com/~nompelis/HCHD/ From nompelis at nobelware.com Tue Dec 18 15:50:01 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Tue, 18 Dec 2018 15:50:01 +0000 Subject: Who wants to help me profile my simple program? In-Reply-To: <20181218152245.GA12257@nobelware.com> References: <20181215051947.GA34468@begriffs.com> <20181215193016.GA10219@nobelware.com> <20181216035437.GC34468@begriffs.com> <20181216161102.GA5551@nobelware.com> <20181217141001.GA21812@nobelware.com> <20181218030747.GA27726@begriffs.com> <20181218152245.GA12257@nobelware.com> Message-ID: <20181218155001.GA13682@nobelware.com> There are some crazy comments in that reddit... You can see opinions that are stylistic, straight up compulsive, and ingenious! I got sucked in... I never thought it was a good idea to attempt to rewind() stdin. I did some extra googling and found out that one can rewind stdin in some cases and when it is coming form a file. Dangerous... stay away. The interview question is still cryptic to me. I will stay away for now because I will get sucked in. There is more good stuff in those answers and comments. I enjoyed the talk on gathering entropy. The obvious non-lazy parts require getting entropy from the OS (obviously). It is good to have functionality for this, and apparently it does exist. At any rate, I'd add my own salt and obscurity in there, at the danger of making things worse, but with test-able means. From dave.bucklin at gmail.com Tue Dec 18 18:56:18 2018 From: dave.bucklin at gmail.com (Dave Bucklin) Date: Tue, 18 Dec 2018 12:56:18 -0600 Subject: our website In-Reply-To: <20181218153049.GA12909@nobelware.com> References: <20181218153049.GA12909@nobelware.com> Message-ID: <20181218185618.c6v4vb2jj4xg6235@19a6.tech> On Tue, Dec 18, 2018 at 03:30:49PM +0000, Ioannis Nompelis wrote: > I had been thinking about this for a while, and Dave, Joe and I had been > discussing this: I would like to beautify our website on frostbyte. I support this. I think it's more of an "ask for forgiveness" type of situation. As long as you back up what's there now, we can go back to it if need be. My only request is that the "about us" copy be available somewhere... which should be accomplished easily. Thanks, Yo. From nompelis at nobelware.com Tue Dec 18 20:29:35 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Tue, 18 Dec 2018 20:29:35 +0000 Subject: our website In-Reply-To: <9C959320E4DE136C.5c7fe4ab-669a-4eef-b98d-9155221686d3@mail.outlook.com> References: <20181218153049.GA12909@nobelware.com> <9C959320E4DE136C.5c7fe4ab-669a-4eef-b98d-9155221686d3@mail.outlook.com> Message-ID: <20181218202935.GA23935@nobelware.com> > > It's a good look. I like it, sort of like an old webring. I wouldn't mind having my site linked, except that there's nothing there interesting for anyone who would be looking. > We link for the sake of linking! Somebody get the files and the CSS and start hacking. Ask for my help if you need it. Dave/Joe can point us to the filesystem location where documents should go. Perhaps we can Git the website structure too. From joe at begriffs.com Sat Dec 29 01:23:07 2018 From: joe at begriffs.com (Joe Nelson) Date: Fri, 28 Dec 2018 19:23:07 -0600 Subject: Ideas for frostbyte 2019? Message-ID: <20181229012307.GA10892@begriffs.com> Any new year's resolutions? What's everyone working on? Here are some ideas for things to try in 2019: * Recurring hack night at a cafe. We could develop a known place and time, like meet every Thursday 6-9 at Code Blu Cafe. Obviously not everyone would attend every meeting, but you might run into a subset of people each time. It would be a fun routine to socialize and maybe meet new people there by accident too. * Buddy system. Share weekly goals and hold each other accountable. * Presentations? I've got some nice recording gear and can take videos of your presentations that you could put on your own site or archive on the frostbyte site. * Book club. Subsets of the us interested in particular things (Unicode, the GNU debugger, POSIX, etc) could form book clubs. * Mentorship. Find local programming bootcamps and teach a foundational class. Help students get interested in systems programming or learn how the underlying technologies (networking, databases, encryption) actually work rather than whatever framework abstraction they are being taught. From joe at begriffs.com Sat Dec 29 01:35:22 2018 From: joe at begriffs.com (Joe Nelson) Date: Fri, 28 Dec 2018 19:35:22 -0600 Subject: Forming an IRC network In-Reply-To: <20181025053813.GA48953@begriffs.com> References: <20181025053813.GA48953@begriffs.com> Message-ID: <20181229013522.GB10892@begriffs.com> Following up on this old thread... Rather than forming an IRC network of our own, we can of course join existing channels. Here are ones I find useful (on freenode.net): ##C / lots of knowledgeable people in this one ##C-offtopic / backchannel where odd conversations happen ##posix / coding against and using the standard ##workingset / questions about compilers, linkers, make etc #tcmaker / hack factory people ##fcc-msp / free computing club MSP #postgresql / the database #postgresql-lounge / backchannel #openbsd / unix wizards For best results you might want to register your nick [0] and request a cloak [1]. 0: https://freenode.net/kb/answer/registration 1: https://freenode.net/kb/answer/cloaks From dave.bucklin at gmail.com Sat Dec 29 01:54:32 2018 From: dave.bucklin at gmail.com (Dave Bucklin) Date: Fri, 28 Dec 2018 19:54:32 -0600 Subject: Ideas for frostbyte 2019? In-Reply-To: <20181229012307.GA10892@begriffs.com> References: <20181229012307.GA10892@begriffs.com> Message-ID: <20181229015432.qny5yh4wzzkpy7vc@19a6.tech> On Fri, Dec 28, 2018 at 07:23:07PM -0600, Joe Nelson wrote: > Any new year's resolutions? What's everyone working on? I just moved, so I'm unpacking and getting settled in. I'm thinking about going after a certification this year, maybe Salesforce, so I'll be putting time into that. > * Recurring hack night at a cafe. We could develop a known place and > time, like meet every Thursday 6-9 at Code Blu Cafe. Obviously not > everyone would attend every meeting, but you might run into a subset of > people each time. It would be a fun routine to socialize and maybe meet > new people there by accident too. I would show up for this. From nompelis at nobelware.com Sat Dec 29 19:25:15 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Sat, 29 Dec 2018 19:25:15 +0000 Subject: Ideas for frostbyte 2019? In-Reply-To: <20181229012307.GA10892@begriffs.com> References: <20181229012307.GA10892@begriffs.com> Message-ID: <20181229192515.GA20445@nobelware.com> Good ideas Joe. I really like the mentorship one. And I would add to this that I would really like some of our expertise here be put in a more presentable form and be used to teach ohters. Think of all items one has on their blog, the things they care about, the things they learned recently, the things they are learning or want/need to learn. Providing examples with some narrative and brackground information is what some bloggers do, including members of this list. I can totally see how these can be put in a form that can become teaching material. (Just throwing it out there.) Let's put it on our website on Frostbyte. I'd go to the recurring hack-night. Short goals, tinkering and hacking is what we are about. Never been to Code Blu. It is a short ride from the U, from my house, and from the Hack Factory. I say we try it. I like the idea of recording presentations. I have some more thoughts on this. I like the Numberphile format. I also like to have narrative while the screen is the only thing shown in the video. I like the presentations to be short, but not "lightning talks." Presentations should be teaching tools, even for us by us. I'd host them on Frostbyte. > > Any new year's resolutions? What's everyone working on? > -1. I want the Frostbyte website to be launched and have some linked content. I can work on this. Will coordinate with Joe/Dave for filesystem access. 0. Will continue pushing forward my personal projects: VR, multi-user CAD-like engine, visualization software, financial data analysis software. 1. I recently sat down and learned the most essentials of SSL using OpenSSL. I had this discussion with Sam, who had great advise for his age! Simply, I found that there were a lot of crappy tutorials, with no breath or depth, and lots of confusing language. I found some hatred for OpenSSL in the process, and I developed some hatred of my own. I found that OpenSSL's documentation is a little worse than "crap" (and I am actually trying to be kind to the makers of this free software). I ended up coding a client-server demo that works great, and I will be applying it to the threaded worker-slave daemon code I had discussed earlier. The goal is to make secure client-server operation for my multi-user software, and to provide HTTPS verification for the software, and to keep user statistics via HTTPS on my server over PHP/MySQL. I am making the functionality generic enough so that others can use it and build on top of it. 2. I want to learn more about UTF-8 in 2019, and locales for Linux/BSD. I want to build a generalization of creating locales for all software I will be making (i.e. throwing an error message in the appropriate language). I think UTF-8 was a great idea, invented by some of the greatest minds in computing. Sorry for the long brain-dump; this is exciting. IN From nompelis at nobelware.com Sat Dec 29 19:28:08 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Sat, 29 Dec 2018 19:28:08 +0000 Subject: Forming an IRC network In-Reply-To: <20181229013522.GB10892@begriffs.com> References: <20181025053813.GA48953@begriffs.com> <20181229013522.GB10892@begriffs.com> Message-ID: <20181229192808.GB20445@nobelware.com> Joe, is our IRC node connected to Freenode? I just joined freenode's chat from the browser in #tcmaker and the browser chat works great. From joe at begriffs.com Sat Dec 29 20:45:54 2018 From: joe at begriffs.com (Joe Nelson) Date: Sat, 29 Dec 2018 14:45:54 -0600 Subject: Forming an IRC network In-Reply-To: <20181229192808.GB20445@nobelware.com> References: <20181025053813.GA48953@begriffs.com> <20181229013522.GB10892@begriffs.com> <20181229192808.GB20445@nobelware.com> Message-ID: <20181229204554.GA52682@begriffs.com> > Joe, is our IRC node connected to Freenode? No, I just connect my IRC client directly to Freenode. While it's possible to run our own server in the Freenode network [0], I'm not sure what the benefits would be. > I just joined freenode's chat from the browser in #tcmaker and the > browser chat works great. Sure that works. Although having a registered nick and password, I prefer to connect with a native client just to keep my password etc on my own machine. HexChat for X11 and LimeChat for Mac are both pretty good. 0: https://freenode.net/support#applying-to-host-a-server From joe at begriffs.com Sun Dec 30 05:50:22 2018 From: joe at begriffs.com (Joe Nelson) Date: Sat, 29 Dec 2018 23:50:22 -0600 Subject: Ideas for frostbyte 2019? In-Reply-To: <20181229192515.GA20445@nobelware.com> References: <20181229012307.GA10892@begriffs.com> <20181229192515.GA20445@nobelware.com> Message-ID: <20181230055022.GC52682@begriffs.com> Ioannis Nompelis wrote: > 1. I recently sat down and learned the most essentials of SSL using > OpenSSL. A book I've been meaning to read is "Implementing SSL/TLS" by Joshua Davies. It has great reviews, and walks through building real cryptosystems in C, step by step. Seems like it's more general than OpenSSL per se, and I could follow along and run the code. > 2. I want to learn more about UTF-8 in 2019, and locales for Linux/BSD. Me too, we're on the same page. :) I want to learn how C's multi-byte, wide characters and locales work together and how they support UTF-8, as well as the design of UTF-8 itself. For the latter there's this book: "Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard" by Richard Gillam. I'd like to build small utilities to manipulate unicode based on what I learn in that book. From nompelis at nobelware.com Sun Dec 30 23:58:16 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Sun, 30 Dec 2018 23:58:16 +0000 Subject: Ideas for frostbyte 2019? In-Reply-To: <20181230055022.GC52682@begriffs.com> References: <20181229012307.GA10892@begriffs.com> <20181229192515.GA20445@nobelware.com> <20181230055022.GC52682@begriffs.com> Message-ID: <20181230235816.GA31761@nobelware.com> > > A book I've been meaning to read is "Implementing SSL/TLS" by Joshua > Davies. It has great reviews, and walks through building real > cryptosystems in C, step by step. Seems like it's more general than > OpenSSL per se, and I could follow along and run the code. > I need to get a hold of that book. I always wanted to build my own crypto system, in the sense of a secure means of communication. Now, to be clear, I do not want to invent ciphers. As they say, "there are two types of people who build ciphers; the really smart ones and the really dumb ones." I fear of being in the second category, as I am certainly not in the first... > > 2. I want to learn more about UTF-8 in 2019, and locales for Linux/BSD. > > Me too, we're on the same page. :) Let's do it. UTF-8 is an extension of ASCII, a brilliant idea of that one guru... cannot recall his name, who made some unices. It is an extension of ASCII, as in, if your software does not support UTF-8, it wil still show the traditional readable ASCII part properly. Anyway, I first need to make my SLackware terminals properly display UTF-8 so that I do not have garbage on my terminal when I read emails. That should be easy. Then, we learn to program with UTF-8. You let me know how you want to go about it. From nompelis at nobelware.com Mon Dec 31 18:46:31 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Mon, 31 Dec 2018 18:46:31 +0000 Subject: Ideas for frostbyte 2019? In-Reply-To: <20181229192515.GA20445@nobelware.com> References: <20181229012307.GA10892@begriffs.com> <20181229192515.GA20445@nobelware.com> Message-ID: <20181231184631.GA6860@nobelware.com> Please allow me to augment my bucketlist of things I just remembered. > > 0. Will continue pushing forward my personal projects: VR, multi-user CAD-like > engine, visualization software, financial data analysis software. > > 1. I recently sat down and learned the most essentials of SSL using OpenSSL. > I had this discussion with Sam, who had great advise for his age! Simply, I > found that there were a lot of crappy tutorials, with no breath or depth, and > lots of confusing language. I found some hatred for OpenSSL in the process, > and I developed some hatred of my own. I found that OpenSSL's documentation > is a little worse than "crap" (and I am actually trying to be kind to the > makers of this free software). I ended up coding a client-server demo that > works great, and I will be applying it to the threaded worker-slave daemon > code I had discussed earlier. The goal is to make secure client-server > operation for my multi-user software, and to provide HTTPS verification for > the software, and to keep user statistics via HTTPS on my server over PHP/MySQL. > I am making the functionality generic enough so that others can use it and > build on top of it. > > 2. I want to learn more about UTF-8 in 2019, and locales for Linux/BSD. I want > to build a generalization of creating locales for all software I will be > making (i.e. throwing an error message in the appropriate language). I think > UTF-8 was a great idea, invented by some of the greatest minds in computing. > 3. Make a regular Linux distro run like an embedded system, preserving root filesystem integrity and killing all temporary data. Do the same with a BSD version. Do it on x86/x86_64 and ARM. From joe at begriffs.com Mon Dec 31 19:15:29 2018 From: joe at begriffs.com (Joe Nelson) Date: Mon, 31 Dec 2018 13:15:29 -0600 Subject: Unicode [was Ideas for frostbyte 2019?] In-Reply-To: <20181230235816.GA31761@nobelware.com> References: <20181229012307.GA10892@begriffs.com> <20181229192515.GA20445@nobelware.com> <20181230055022.GC52682@begriffs.com> <20181230235816.GA31761@nobelware.com> Message-ID: <20181231191529.GA83089@begriffs.com> > Let's do it. Cool! I started researching how to use C with international text, and there are historical complications. All the way back in C89 the committee distinguished "multibyte" and "wide" characters, and provided some conversion functions in stdlib.h. The idea was to be more general than any particular encoding system. Multibyte characters take a variable number of char to encode each value (i.e. each codepoint), and as you scan through a multibyte string there is "shift state" to track whether the current char is the start, continuation, or end of a character. The general idea was that the network and filesystems would continue to work in terms of bytes, delivering multibyte characters, and the program would convert them into wide characters for internal representation. Wide characters, wchar_t, are each supposed to hold an entire codepoint (while being somewhat wasteful for ASCII). C99 added wctype.h to give us those nice classification functions like iswspace(), iswgraph() etc for wide character text processing. The way that multibyte converts to wide character is determined by the locale, specifically LC_CTYPE. Your program can do setlocale(LC_CTYPE, "en_US.UTF-8"); to use the now ubiquitous UTF-8 multibyte encoding. Or it can inherit the $LC_CTYPE environment variable by doing setlocale(LC_CTYPE, ""); If you fail to setlocale then it defaults to "C" which means just ASCII. This whole philosophy sounds good, except that vendors implemented it too soon and locked themselves into shortsighted choices. Microsoft chose wchar_t to be only two bytes long, because at the time the Unicode consortium (well actually the contemporaneous European ISO committee) was endorsing the Universal Coded Character Set UCS-2. Shortly afterward the four byte UCS-4 was developed (ISO 10646) and that's what today's Unicode uses. So you can't portably rely on wchar_t. One school of thought says forget it, Microsoft is brain damaged yet again, don't worry about them. C99 guarantees a macro will be present to say whether the current environment is ISO 10646 compliant, and we can blow up if it's not: #ifndef __STDC_ISO_10646__ #error "Your wide characters suck." #endif Some systems do continue to use wchar_t judiciously. For instance, utilities ported to OpenBSD will use wide characters if they need more advanced text processing. Here's an interesting presentation: https://www.openbsd.org/papers/eurobsdcon2016-utf8.pdf Another school of thought says use UTF-8 everywhere, even in program memory. There's even a manifesto called, appropriately enough, https://utf8everywhere.org Going this route we would need a third-party library for UTF-8 text processing. Given the intricacy of internationalization, a dedicated library might do better than the wctype functions anyway. A popular one appears to be ICU4C. There are also lighter weight ones like utf8proc. In terms of our own learning process, we could learn to use one of these libraries to make fun little utilities. I was thinking we could write our own simplified library too, but perhaps such an undertaking is merely tedious and not instructive. I found, while experimenting a little printing unicode strings, that by default vim saves files in the latin1 encoding. In your .vimrc you should set encoding=utf-8 Vim does honor the $LANG environment variable, but that variable seems too heavy handed. I prefer to set this in my .kshrc: LC_CTYPE=en_US.UTF-8 export LC_CTYPE Anybody have things to point out that I missed? I'm new to this stuff. From joe at begriffs.com Mon Dec 31 20:29:30 2018 From: joe at begriffs.com (Joe Nelson) Date: Mon, 31 Dec 2018 14:29:30 -0600 Subject: Static site (with m4?) Message-ID: <20181231202930.GC83089@begriffs.com> Ioannis is putting together a new frostbyte site and for convenience he wants to keep various parts of the webpage in their own files and then assemble them into the final page. For instance the main menu might live in its own file, the header in another etc. What's the cleanest way to do this? He was thinking of using PHP, but to me that feels pretty heavy for what is essentially a static site. Also PHP requires some work to harden its vulnerabilities and keep it up to date. I remember Dave has extolled the virtues of a simple macro language to assemble files. What do you think Dave, is this a case for M4? I found someone else who built a site this way: https://linuxgazette.net/issue22/using_m4.html From nompelis at nobelware.com Mon Dec 31 21:18:13 2018 From: nompelis at nobelware.com (Ioannis Nompelis) Date: Mon, 31 Dec 2018 21:18:13 +0000 Subject: Static site (with m4?) In-Reply-To: <20181231202930.GC83089@begriffs.com> References: <20181231202930.GC83089@begriffs.com> Message-ID: <20181231211813.GA12319@nobelware.com> We can do m4. I do not know of any PHP vulnerabilities that do not involve an SQL backend. Do you? PHP will allow us to do a lot more if we do want to. From louis at goessling.com Mon Dec 31 21:26:09 2018 From: louis at goessling.com (Louis) Date: Mon, 31 Dec 2018 15:26:09 -0600 Subject: Static site (with m4?) In-Reply-To: <20181231211813.GA12319@nobelware.com> References: <20181231202930.GC83089@begriffs.com> <20181231211813.GA12319@nobelware.com> Message-ID: I'd rather see m4 consigned to history, personally. My 2c would be to use a "real" scripting language if we intend to have dynamic content, or use something like zola < https://github.com/getzola/zola > if what we want is to write our pages in markdown and then preprocess + serve statically. We use zola for the ACM websites < https://acm.umn.edu/ > and it works well. As far as scripting is concerned, I think we're safe as long as we don't take any user input. Did we intend to? On Mon, Dec 31, 2018 at 3:18 PM Ioannis Nompelis wrote: > > We can do m4. > > I do not know of any PHP vulnerabilities that do not involve an SQL backend. > Do you? > > PHP will allow us to do a lot more if we do want to. > From joe at begriffs.com Mon Dec 31 22:53:22 2018 From: joe at begriffs.com (Joe Nelson) Date: Mon, 31 Dec 2018 16:53:22 -0600 Subject: Static site (with m4?) In-Reply-To: References: <20181231202930.GC83089@begriffs.com> <20181231211813.GA12319@nobelware.com> Message-ID: <20181231225322.GE83089@begriffs.com> > I'd rather see m4 consigned to history, personally. I'm not personally attached to m4 (haven't actually used it), but I am curious if you can point out some of its problems. Are you thinking specifically about the gotchas on the page I linked? > My 2c would be to ... use something like zola < > https://github.com/getzola/zola > if what we want is to write our > pages in markdown and then preprocess + serve statically. Yeah I use markdown for my own site as well, works nicely. The zola repo has a list of similar projects. One of them -- Hugo -- seems to have about the same functionality, while also having an obsd package making it easier to install: http://ports.su/www/hugo Ultimately Ioannis is taking the initiative to improve the site so I'll defer to his choice, but thought I'd ask the wisdom of the list first.