Bertrand Florat technical articlesSome stories about programming, devops, system, security...2023-12-27T00:00:00+00:00https://florat.net/Bertrand FloratBeyond Murphy' Law2023-12-27T00:00:00+00:00https://florat.net/beyond-murphyandapos-law/<img src="https://florat.net/assets/images/blog-tech/article-36.webp" alt="Upside-down house" width="500">
<p>This article has also been published at <a href="https://dzone.com/articles/beyond-murphy-law">DZone</a>.</p>
<p>Murphy's Law (<strong>"Anything that can go wrong will go wrong, and at the worst possible time."</strong>) is a well-known adage, especially in engineering circles. However, its implications are often misunderstood, especially by the general public. It's not just about the universe conspiring against our systems; <strong>it's about recognizing and preparing for potential failures</strong>.</p>
<p>Many view Murphy's Law as a blend of magic and reality. As Site Reliability Engineers (SREs), we often ponder its true nature. Is it merely a psychological bias, where we emphasize failures and overlook our unnoticed successes? Psychology has identified several related biases, including Confirmation and Selection biases. The human brain tends to focus more on improbable failures than successes. Moreover, our grasp of probabilities is often flawed – the Law of Truly Large Numbers suggests that coincidences are, ironically, quite common.</p>
<p>However, in any complex system, a multitude of possible states exist, many of which can lead to failure. While safety measures make a transition from a functioning state to a failure state less likely, over time, it's more probable for a system to fail than not.</p>
<p>The real lesson from Murphy's Law isn't just about the omnipresence of misfortune in engineering but also how we respond to it: through redundancies, high availability systems, quality processes, testing, retries, observability, and logging. Murphy's Law makes our job more challenging and interesting!</p>
<p>Today, however, I'd like to discuss complementary or reciprocal aspects of Murphy's Law that I've often observed while working on large systems:</p>
<h2>Complementary Observations to Murphy's Law</h2>
<h3>The Worst Possible Time Complement</h3>
<p>Often overlooked, this aspect highlights the 'magic' of Murphy's Law. Complex systems do fail, but not so frequently that we forget them. In our experience, a significant number of failures (about one-third) occur at the worst possible times, such as during important demos.</p>
<p>For instance, over the past two months, we had a couple of important demos. In the first demo, the web application failed due to a session expiration issue, which rarely occurs. In the second, a regression embedded in a merge request caused a crash right during the demo. These were the only significant demos we had in that period, and both encountered failures. This phenomenon is often referred to as the 'Demo Effect'</p>
<h3>The Impossibility Complement</h3>
<p>Murphy's law states that everything that <strong>can go wrong</strong>" will indeed go wrong. Despite obviously being a syllogism, I would add that even what <strong>can't go wrong actually does</strong>.</p>
<p>Developers often note that even systems deemed infallible can fail, a sentiment captured by "it works on my machine".Failures can stem from various factors, including inadequate testing, unrealistic simulations, neglecting dev-prod parity, or overlooking the need for high-concurrency tests.</p>
<p>The causes are numerous: insufficient testing datasets, unrealistic loads, lack of robustness tests, disregarding the dev-prod parity principle, or failing to test in a highly concurrent environment.</p>
<p>This can occur with both functional and technical issues:</p>
<ul>
<li>
<p>In a recent issue involving a current project, we had a serious technical crash when we received a invalid date ('29 february' in a non-leap year ) from a partner (date was typed as string by end-users without any control from their side). Our business analysts had explicitly advised against testing date validity, assuming such an error "couldn't happen".</p>
</li>
<li>
<p>Another incident involved a technical glitch (system out of memory), despite our belief that it was impossible after configuring our Java Virtual Machines to utilize all available memory at startup. In theory, Java limits memory usage. Yet, our Java application was terminated by the Linux kernel's 'oom-killer' to prevent a complete server freeze. This was possible because our program ran on a Virtual Machine managed by ESXi, which could perform 'ballooning,' a mechanism to force VMs to swap memory to disk. This process was largely unknown to developers, integrators, and most operators, proving challenging to understand.</p>
</li>
</ul>
<p>The lesson learned: the importance of adopting highly defensive programming and creating robust systems cannot be overstated.</p>
<h3>The Conjunction of Events Complement</h3>
<p>The combination of events leading to a breakdown can be truly astonishing.</p>
<p>For example, I once inadvertently caused a major breakdown in a large application responsible for sending electronic payrolls to 5 million people, coinciding with its production release day. The day before, I conducted additional benchmarks (using JMeter) on the email sending system within the development environment. Our development servers, like others in the organization, were configured to route emails through a production relay, which then sent them to the final server in the cloud. Several days prior, I had set the development server to use a mock server since my benchmark simulated email traffic peaks of several hundred thousand emails per hour. However, the day after my benchmarking, when I was off work, my boss called to inquire if I had made any special changes to email sending, as the entire system was jammed at the final mail server.</p>
<p>Here’s what had happened:</p>
<ul>
<li>An automated Infrastructure as Code (IAC) tool overwrote my development server configuration, causing it to send emails to the actual relay instead of the mock server;</li>
<li>The relay, recognized by the cloud provider, had its IP address changed a few days earlier;</li>
<li>The whitelist on the cloud side hadn't been updated, and a throttling system blocked the final server;</li>
<li>The operations team responsible for this configuration was unavailable to address the issue.</li>
</ul>
<h3>The Squadron Complement</h3>
<p>Problems often cluster, complicating resolution efforts. These range from simultaneous issues exacerbating a situation to misleading issues that divert us from the real problem.</p>
<p>I can categorize these issues into two types:</p>
<ul>
<li>
<p><strong>The Simple Additional Issue</strong>: This typically occurs at the worst possible moment, such as during another breakdown, adding more work or slowing down repairs. For instance, in a current project I'm involved with, due to legacy reasons, certain specific characters inputted into one application can cause another application to crash, necessitating data cleanup. This issue arises roughly once every 3 or 4 months, often triggered by user instructions. Notably, several instances of this issue have coincided with much more severe system breakdowns.</p>
</li>
<li>
<p><strong>The Deceitful Additional Issue</strong>: These issues, when combined with others, significantly complicate post-mortem analysis and can mislead the investigation. A recent example was an application bug in a Spring batch job that remained obscured due to a connection issue with the state-storing database, caused by intermittent firewall outages.</p>
</li>
</ul>
<h3>The Camouflage Complement</h3>
<p>Using ITIL's problem/incidents framework, we often find incidents that appear similar but have different causes.</p>
<p>We apply the ITIL framework's problem/incident dichotomy to classify issues, where a problem can generate one or more incidents.</p>
<p>When an incident occurs, it's crucial to conduct a thorough analysis by carefully examining logs to figure out if this is only a new incident of a known problem or an entire new problem. Often, we identify incidents that appear similar to others, possibly occurring on the same day, exhibiting comparable effects but stemming from different causes. This is particularly true when incorrect error-catching practices are in place, such as using overly broad catch(Exception) statements in Java, which can either trap too many exceptions or, worse, obscure the root cause.</p>
<h3>The Over-Accident Complement</h3>
<p>Like chain reactions in traffic accidents, one incident in IT can lead to others, sometimes with more severe consequences.</p>
<p>I can recall at least three recent examples illustrating our challenges:</p>
<ul>
<li>
<p><strong>Maintenance Page Caching Issue</strong>: Following a system failure, we activated a maintenance page, redirecting all API and frontend calls to this page. Unfortunately, this page lacked proper cache configuration. Consequently, when a few users made XHR calls precisely at the time the maintenance page was set up, it was cached in their browsers for the entire session. Even after maintenance ended and the web application frontend resumed normal operation, the API calls continued to retrieve the HTML maintenance page instead of the expected JSON response due to this browser caching.</p>
</li>
<li>
<p><strong>Debug Verbosity Issue</strong>: To debug data sent by external clients, we store payloads into a database. To maintain a reasonable database size, we limited the stored payload sizes. However, during an issue with a partner organization, we temporarily increased the payload size limit for analysis purposes. This change was inadvertently overlooked, leading to an enormous database growth and nearly causing a complete application crash due to disk space saturation.</p>
</li>
<li>
<p><strong>API Gateway Timeout Handling</strong>: Our API gateway was configured to replay POST calls that ended in timeouts due to network or system issues. This setup inadvertently led to catastrophic duplicate transactions. The gateway reissued requests that timed out, not realizing these transactions were still processing and would eventually complete successfully. This resulted in a conflict between robustness and data integrity requirements.</p>
</li>
</ul>
<h3>The Heisenbug Complement</h3>
<p>A 'heisenbug' is a type of software bug that seems to alter or vanish when one attempts to study it. This term humorously references the Heisenberg Uncertainty Principle in quantum mechanics, which posits that the more precisely a particle's position is determined, the less precisely its momentum can be known, and vice versa.</p>
<p>Heisenbugs commonly arise from race conditions under high loads or other factors that render the bug's behavior unpredictable and difficult to replicate in different conditions or when using debugging tools. Their elusive nature makes them particularly challenging to fix, as the process of debugging or introducing diagnostic code can change the execution environment, causing the bug to disappear.</p>
<p>I've encountered such issues in various scenarios. For instance, while using a profiler, I observed it inadvertently slowing down threads to such an extent that it hid the race conditions.</p>
<p>On another occasion, I demonstrated to a perplexed developer how simple it was to reproduce a race condition on non-thread-safe resources with just two or three threads running simultaneously. However, he was unable to replicate it in a single-threaded environment.</p>
<h3>The UFO Issue Complement</h3>
<p>A significant number of issues are neither fixed nor fully understood. I'm not referring to bugs that are understood but deemed too costly to fix in light of their severity or frequency. Rather, I'm talking about those perplexing issues whose occurrence is extremely rare, sometimes happening only once.</p>
<p>Occasionally, we (partially) humorously attribute such cases to Single Event Errors caused by cosmic particles.</p>
<p>For example, in our current application that generates and sends PDFs to end-users through various components, we encountered a peculiar issue a few months ago. A user reported, with a screenshot as evidence, a PDF where most characters appeared as gibberish symbols instead of letters. Despite thorough investigations, we were stumped and ultimately had to abandon our efforts to resolve it due to a complete lack of clues.</p>
<h3>The Non-Existing Issue Complement</h3>
<p>One particularly challenging type of issue arises when it seems like something is wrong, but in reality, there is no actual bug. These non-existent bugs are the most difficult to resolve! The misconception of a problem can come from various factors including: looking in the wrong place (such as the incorrect environment or server), misinterpreting functional requirements, or receiving incorrect inputs from end-users or partner organizations.</p>
<p>For example, we recently had to address an issue where our system rejected an uploaded image. The partner organization assured us that the image should be accepted, claiming it was in PNG format. However, upon closer examination (that took us several staff-days), we discovered that our system's rejection was justified: the file was not actually a PNG.</p>
<h3>The False Hope Complement</h3>
<p>I often find Murphy's Law to be quite cruel. You spend many hours working on an issue, and everything seems to indicate that it is resolved, with the problem no longer reproducible. However, once the solution is deployed in production, the problem reoccurs. This is especially common with issues related to heavy loads or concurrency.</p>
<h3>The Anti-Murphy's Reciprocal</h3>
<p>In every organization I've worked for, I've noticed a peculiar phenomenon, which I'd call 'Anti-Murphy's Law'. Initially, during the maintenance phase of building an application, Murphy’s Law seems to apply. However, after several more years, a contrary phenomenon emerges: even subpar software appears not only immune to Murphy's Law but also more robust than expected. Many legacy applications run glitch-free for years, often with less observation and fewer robustness features, yet they still function effectively. The better the design of an application, the quicker it reaches this state, but even poorly designed ones eventually get there.</p>
<p>I have only some leads to explain this strange phenomenon:</p>
<ul>
<li>
<p>Over time, users become familiar with the software's weaknesses and learn to avoid them by not using certain features, waiting longer, or using the software during specific hours.</p>
</li>
<li>
<p>Legacy applications are often so difficult to update that they experience very few regressions.</p>
</li>
<li>
<p>Such applications rarely have their technical environment (like the OS or database) altered, to avoid complications.</p>
</li>
<li>
<p>Eventually, everything that could go wrong has already occurred and been either fixed or worked around: it's as if Murphy's Law has given up.</p>
</li>
</ul>
<p>However, don't misunderstand me: I'm not advocating for the retention of such applications. Despite appearing immune to issues, they are challenging to update and increasingly fail to meet end-user requirements over time. Concurrently, they become more vulnerable to security risks.</p>
<h2>Conclusion</h2>
<p>Rather than adopting a pessimistic view of Murphy's Law, we should be thankful for it. It drives engineers to enhance their craft, compelling them to devise a multitude of solutions to counteract potential issues. These solutions include robustness, high availability, fail-over systems, redundancy, replays, integrity checking systems, anti-fragility, backups and restores, observability, and comprehensive logging.</p>
<p>In conclusion, addressing a final query: can Murphy's Law turn against itself? A recent incident with a partner organization sheds light on this. They mistakenly sent us data and relied on a misconfiguration in their own API Gateway to prevent this erroneous transmission. However, by sheer coincidence, the API Gateway had been corrected in the meantime, thwarting their reliance on this error. Thus, the answer appears to be a resounding <strong>NO</strong>.</p>
Top Mistakes Made by Product Owners in Agile Projects2023-09-01T00:00:00+00:00https://florat.net/top-mistakes-made-by-product-owners-in-agile-projects/<img src="https://florat.net/assets/images/blog-tech/article-34.jpg" alt="Upside-down house" width="500">
<p>As a Product Owner (PO), your role is crucial in steering an agile project towards success. However, it's equally important to be aware of the pitfalls that can lead to failure. In this blog post, we'll explore the actions that should be avoided to ensure your agile project stays on track and delivers valuable outcomes. It's worth noting that the GIGO (Garbage In - Garbage Out) effect is a significant factor: no good product can come from bad design.</p>
<p>This article has also been published at <a href="https://dzone.com/articles/top-mistakes-made-by-product-owners-in-agile-proje">DZone</a>.</p>
<h2>On Agile and Business Design Skills</h2>
<h3>Lack of Design Methodology Awareness</h3>
<p>One of the initial steps towards failure is disregarding design
methodologies such as <a href="https://www.agilealliance.org/resources/books/user-story-mapping/">Story
Mapping</a>, <a href="https://www.eventstorming.com/">Event Storming</a>, Impact Mapping, or Behavioral Driven Development. Treating these methodologies as trivial or underestimating their complexity or power can hinder your project's progress. Instead, take the time to learn, practice, and seek coaching in these techniques to create well-defined business requirements.</p>
<p>For example, I once worked on a project where the PO practiced Story Mapping without even involving the end-users...</p>
<h3>Ignoring Domain Knowledge</h3>
<p>Neglecting to understand your business domain can be detrimental. Avoid skipping internal training sessions, Massive Open Online Courses (MooCs), and field observation workshops. Read domain reference books and, more generally, embrace domain knowledge to make informed decisions that resonate with both end-users and stakeholders.</p>
<p>To continue with the previous example, the PO who was new in the project domain field (although having basic knowledge) missed an entire use-case with serious architectural implications due to a lack of skills, requiring significant software changes after only a few months.</p>
<h3>Disregarding End-User Feedback</h3>
<p>Overestimating your understanding and undervaluing end-user feedback
can lead to the <a href="https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect">Dunning-Kruger effect</a>. Embrace humility and actively involve end-users in the decision-making process to create solutions that truly meet their needs. Failure to consider real-world user constraints and work processes can lead to impractical designs. Analyze actual and operational user experiences, collect feedback, and adjust your approach accordingly. Don't imagine their requirements and issues but ask actual users who deal with real-world complexity all the time.</p>
<p>For instance, a PO I worked with ignored or postponed many obvious GUI issues from end-users, rendering the application nearly unusable. These UX issues included the absence of basic filters on screens, making it impossible for users to find their ongoing tasks. These issues were yet relatively simple to fix. Conversely, this PO pushed unasked-for features and even features rejected by most end-users, such as complex GUI locking options. Furthermore, any attempt to set up tools to collect end-user feedback was dismissed.</p>
<h2>Team Dynamics</h2>
<h3>Centralized Decision-Making</h3>
<p>Isolating decision-making authority within your hands without consulting IT or other designers can stifle creativity and collaboration. Instead, foster open communication and involve team members in shaping the project's direction. The three pillars of agility, as defined in the Agile Manifesto, are <strong>Transparency</strong>, Inspection and Adaptation. The essence of an agile team is continuous improvement, which becomes challenging when a lack of trust hinders the identification of real issues.</p>
<p>Some POs unfortunately adopt a "divide and rule" approach, which keeps knowledge and power in their sole hands. I have observed instances where POs withheld information or even released incorrect information to both end-users and developers, and actively prevented any exchange between them.</p>
<h3>Geographical Disconnection</h3>
<p>Geographically separating end-users, designers, testers, PO and developers can hinder communication. Leverage modern collaboration tools, but don't rely solely on them. Balance digital tools with face-to-face interactions to maintain strong team connections and enables osmotic communication, which has proven to be highly efficient in keeping everyone informed and involved.</p>
<p>The worst case I had to deal with was a project where developers were centralized in the same building as the end-users, while the PO and design team were distributed in another city. Most workshops were done remotely between both cities. In the end, the design result was very poor. It improved drastically when some designers were finally collocated with the end-users (and developers) and were able to conduct <em>in situ</em> formal and informal workshops.</p>
<h2>Planning and Execution</h2>
<h3>Over-Optimism and Lack of Contingency Plans</h3>
<p>Hope should not be your strategy. Don't overselling features to end-users. Being overly optimistic and neglecting backup plans can lead to missed deadlines and unexpected challenges. Develop robust contingency plans (Plan B) to navigate uncertainties effectively. Avoid promising unsustainable plans to
stakeholders. After two or three delays, they may lose trust in the project.</p>
<p>I worked on a project where the main release was announced to stakeholders by the PO every two months over a 1.5-year timeline without consulting the development team. As you can imagine, the effect was devastating over the image of the project.</p>
<h3>Inadequate Stakeholder Engagement</h3>
<p>Excluding business stakeholders from demos and delaying critical communications can lead to misunderstandings and misaligned expectations. Regularly engage stakeholders to maintain transparency and gather valuable feedback.</p>
<p>As an illustration, in a previous project, we conducted regular sprint demos; however, we failed to invite end-users to most sessions. Consequently, significant ergonomic issues went unnoticed, resulting in a substantial loss of time. Additionally, within the same project, the Product Owner (PO) organized meetings with end-users mainly to present solutions via fully completed mockups, rather than facilitating discussions to precisely identify operational requirements, which inhibited them.</p>
<h3>Embracing Waterfall Practices</h3>
<p>Thinking in terms of a waterfall approach, rather than embracing iterative development, can hinder progress, especially on a project meant to be managed with agile methodologies. Minimize misunderstandings by providing regular updates to stakeholders. Break features into increments, leverage Proof of Concepts (POC), and prioritize the creation of Minimal Viable Products (MVP) to validate assumptions and ensure steady progress.</p>
<p>As an example, I recently had a meeting with end-users explaining that a one-year coding tunnel period resulted in a first application version almost unusable and worse than the 20-year-old application we were supposed to rewrite. With re-established communication and end-users' involvement, this has been fixed in a few months.</p>
<h3>Producing Too Much Waste</h3>
<p>As a designer, avoid creating a large stock of User Stories (US) that
will be implemented in months or years. This way, you work against the
Lean principle to fight the overproduction <em>muda</em> (waste) and you produce many specifications at the worst moment (when knowing the least about actual business requirements), and this work has all chances to be thrown away.</p>
<p>I had an experience where a PO and their designer team wrote US until one year before they were actually coded and left almost unmaintained. As expected, most of it was thrown away or, even worse, caused various flaws and misunderstandings among the development team when finally planned for the next sprint. Most backlog refinements and explanations had to be redone. User stories should be refined to a detailed state only one or two sprints before being coded. However, it's a good practice to fill the backlog sandbox with generally outlined features. The rule of thumb is straightforward: user stories should be detailed as close to the coding stage as possible. When they are fully detailed, they are <em>ready</em> for coding. Otherwise, you are likely to waste time and resources.</p>
<h3>Volatile Objectives</h3>
<p>Try to set consistent objectives at each sprint. Avoid context
switching among developers, which can prevent them from starting many
different features but never finishing any.</p>
<p>To provide an example, in a project where the Product Owner (PO) interacted with multiple partners, priorities were altered every two or three sprints mainly due to political considerations. This was often done to appease the most frustrated partners who were awaiting certain features (often promised with unrealistic deadlines).</p>
<h3>Lack of Planning Flexibility</h3>
<p>Utilize the DevOps methodology toolkit, including tools such as feature flags, dark deployments, and canary testing, to facilitate more streamlined planning and deployment processes.</p>
<p>As an architect, I once had a tough time convincing a PO to use canary-testing deployment strategy to learn fast and release early while greatly limiting risks. After a resounding failure when opening the application to the entire population, we finally used canary-testing and discovered performance and critical issues on a limited set of voluntary end-users. It is now a critical aspect of the project management toolkit we use extensively.</p>
<h3>Extended Delays Between Deployments</h3>
<p>Even if a product is built incrementally within 2 or 3-week timeframes, many large projects (including all those I've been a part of) tend to wait for several iterations before deploying the software in production. This presents a challenge because each iteration should ideally deliver some form of value, even if it's relatively small, to end-users. This approach aligns with the mantra famously advocated by Linus Torvalds: 'Release early, release often.'</p>
<p>Some Product Owners (PO) are hesitant to push iterations into production, often for misguided reasons. These concerns can include fears of introducing bugs (indicating a lack of automated and acceptance testing), incomplete iterations (highlighting issues with user story estimation or development team velocity), a desire to provide end-users with a more extensive set of features in one go, thinking they'll appreciate it, or an attempt to simplify the user learning curve (revealing potential user experience (UX) shortcomings). In my experience, this hesitation tends to result in the accumulation of various issues, such as bugs or performance problems."</p>
<h2>Design Considerations</h2>
<h3>Solution-First Mentality</h3>
<p>Prioritizing solutions over understanding the business needs can lead to misguided decisions. Focus on the "Why" before diving into the "How" to create solutions that truly address user requirements.</p>
<p>As a bad practice, I've seen User Stories including technical content (like SQL queries) or presenting detailed technical operations or screens as business rules.</p>
<h3>Oversized User Stories</h3>
<p>Designing large, complex user stories instead of breaking them into manageable increments can lead to confusion and delays. Embrace smaller, more focused user stories to facilitate smoother development, predictability in planning, and testing. Inexperienced Product Owners (POs) often find it challenging to break down features into small, manageable User Stories (US). This is a sort of art, and there are <a href="https://www.agilealliance.org/glossary/split/">numerous ways</a> to accomplishing it based on the context. However, it's important to remember that each story should deliver value to end-users.</p>
<p>As an example, in a previous project, the Product Owner (PO) struggled to effectively divide stories or engaged in purely technical splitting, such as creating one User Story (US) for the frontend and another for the backend portion of a substantial feature. Consequently, 50% of the time, this resulted in incomplete User Stories that required rescheduling for the subsequent sprint.</p>
<h3>Neglecting Expertise</h3>
<p>Avoiding consultation with experts such as UX designers, accessibility specialists, and legal advisors can result in suboptimal solutions. Leverage their insights to create more effective and user-friendly designs.</p>
<p>As a case in point, I've observed multiple projects where the lack of a proper user experience (UX) led to inadequately designed graphical user interfaces (GUIs), incurring substantial costs for rectification at a later stage. In specific instances, certain projects demanded legal expertise, particularly in matters of data privacy. Moreover, I encountered a situation where a Product Owner (PO) failed to involve legal specialists, resulting in the final product omitting crucial legal notices or even necessitating significant architectural revisions.</p>
<h3>Ignoring Performance Considerations</h3>
<p>Neglecting performance constraints, such as displaying excessive data on screens without filters, can negatively impact user experience. Prioritize efficient design to ensure optimal system performance.</p>
<p>I once worked on a large project where the Product Owner (PO) requested the computation of a Gantt chart involving tens of thousands of tasks spanning over 5 years. Ironically, in 99.9% of cases, a single week was sufficient. This unnecessarily intricate requirement significantly complicated the design process and resulted in the product becoming nearly unusable due to its excessive slowness.</p>
<h3>Using the wrong words</h3>
<p>Failing to establish a shared business language and glossary can create
confusion between technical and business teams. Embrace the Ubiquitous
Language (UL) Domain-Driven Design principle to enhance communication
and clarity.</p>
<p>I once worked on a project where PO and designers didn't
set up any business terms glossary, used custom vocabulary instead of a
business one, and used fuzzy or interchangeable synonyms even for the
terms they coined themselves. This created many issues and confusion
among the team or end-users and even duplicated work.</p>
<h3>Postponing Legal and Regulatory Considerations</h3>
<p>Late discovery of legal, accessibility, or regulatory requirements can lead to costly revisions. Incorporate these considerations early to avoid setbacks during development.</p>
<p>I observed a significantly large project where the Social Security number had to be eliminated later on. This led to the need for additional transformation tools since this constraint was not taken into account from the beginning.</p>
<h3>Code Considerations Interferences</h3>
<p>Refine business requirements and don't interfere with code organization, which often has its own constraints. For instance, asking the development team to always enforce the reuse (DRY) principle through very generic interfaces comes from a good intention but may greatly overcomplicate the code (which violates the KISS principle).</p>
<p>In a recent project, a Product Owner (PO) who had a background in development frequently complicated the design by explicitly instructing developers to extend existing endpoints or SQL queries instead of creating entirely new ones, which would have been simpler. Many developers followed the instructions outlined in the User Stories (US) without fully grasping the potential drawbacks in the actual implementation. This occasionally resulted in convoluted code and wasted time rather than achieving efficiency gains.</p>
<h2>Acceptance Testing</h2>
<h3>Neglecting Alternate Paths</h3>
<p>Focusing solely on nominal cases (“happy paths”) and ignoring real-world scenarios can result in very incomplete testing. Ensure that all possible paths, including corner cases, are thoroughly tested to deliver a robust solution.</p>
<p>In a prior project, a multitude of bugs and crashes surfaced exclusively during the production phase due to testing being limited to nominal scenarios. This led to team disorganization as urgent hotfixes had to be written immediately, tarnishing the project's reputation and incurring substantial costs.</p>
<h3>Missing Acceptance Criteria</h3>
<p>Leverage the <a href="https://www.agilealliance.org/glossary/three-amigos/">Three Amigos principle</a> to involve cross-functional team members in creating comprehensive acceptance criteria. Incorporate examples in user stories to clarify expectations and ensure consistent understanding. Example mapping is a great workshop to achieve it. Being able to write down examples ensures many things: firstly that you have at least one realistic case for this requirement and that it is not imaginary; secondly, listing different cases is a powerful tool to gain an estimation of the alternate paths exhaustively (see the previous point) and make them emerge; lastly, it is one of the best common understanding material you can share with developers.</p>
<p>By way of illustration, when designers began documenting real-life scenarios using Behavioral Driven Development (BDD) executable specifications, numerous alternate paths emerged naturally. This led to a reduction in production issues (as discussed in the previous section) and a gradual slowdown in their occurrence.</p>
<h3>Lack of Professional Testing Expertise</h3>
<p>Incorporating professional testers and testing tools enhances defect detection and overall quality. Invest in thorough testing to identify issues early, ensuring a smoother user experience. Not using tools also makes it more difficult for external stakeholders to figure out what has been actually tested. Conducting rigorous testing is indeed a genuine skill.</p>
<p>In a previous project, I witnessed testers utilizing basic spreadsheets to record and track testing scenarios. This approach rendered it difficult to accurately determine what had been tested and what hadn't. Consequently, the Product Owner (PO) had to validate releases without a clear understanding of the testing coverage. Tools like the Open Source <a href="https://www.squashtest.com/product-squash-tm?lang=en">SquashTM</a> are excellent for specifying test requirements and monitoring acceptance tests coverage. Furthermore, the testers were not testing professionals but rather designers, which frequently resulted in challenges when trying to obtain detailed bug reports. These reports lacked precision, including crucial information such as the exact time, logs, scenarios, and datasets necessary for effective issue reproduction.</p>
<h2>Take-Away Summary</h2>
<table>
<thead>
<tr>
<th>Symptom<br></th>
<th>Possible Causes and Solutions</th>
</tr>
</thead>
<tbody>
<tr>
<td><b>A solution that is not aligned with end-users' needs.</b><br></td>
<td><b>Ineffective Workshops with End-Users:</b>
<br>
- If workshops are conducted remotely, consider organizing them onsite.<br>
- Ensure you are familiar with agile design methods like Story Mapping.<br><br><br>
<b>Insufficient Attention to End-Users' Needs:</b><br>
- Make sure to understand the genuine needs and concerns of end-users, and avoid relying solely on personal intuitions or managerial opinions.<br>
- Gather end-users' feedback early and frequently.<br>
- Utilize appropriate domain-specific terminology (Ubiquitous Language).<br><br>
</td>
</tr>
<tr>
<td><b>Limited Trust from End-Users and/or Development Team.</b><br></td>
<td><b>Centralized Decision-Making:</b><br>
- Foster open communication and involve team members in shaping the project's direction.<br>
- Enhance transparency through increased communication and information sharing.<br>
<br><br><b>Unrealistic Timelines:</b> <br>
- Remember that "Hope is not a strategy"; avoid excessive optimism.<br>
- Aim for consistent objectives in each sprint and establish a clear trajectory.<br>
- Employ tools that enhance schedule flexibility and ensure secure production releases, such as canary testing.<br>
</td>
</tr>
<tr>
<td><b>Design Overhead.</b><br></td>
<td><b>User story overproduction:</b><br>
- Minimize <i>muda</i> (waste) and refine user stories only when necessary, just before they are coded.
<br><br><b>Challenges in Designer-Development Team Communication:</b><br>
- Encourage regular physical presence of both design and development teams in the same location, ideally several days a week, to enhance direct and osmotic communication.
<br>
- Focus on describing the 'why' rather than the 'how'. Leave technical specifications to the development team. For instance, when designing a database model, you might create the Conceptual Data Model, but ensure the team knows it's not the Physical Data Model.<br>
</td>
</tr>
<tr>
<td><b>Discovery of Numerous Production Bugs.</b><br></td>
<td>
<b>Incomplete Acceptance Testing:</b><br>
- Develop acceptance tests simultaneously with the user stories and in collaboration with future testers.
<br>
- Conduct tests in a professional and traceable manner, involving trained testers who use appropriate tools.
<br>
- Test not only the 'happy paths' but also as many alternative paths as
possible.
<br><br>
<b>Lack of Automation:</b><br>
- Implement automated tests, especially unit tests, and equally important, executable specifications (Behavioral Driven Development) derived from the acceptance tests outlined in the user stories. Explore tools like <a href="https://spockframework.org/">Spock</a>.<br>
</td>
</tr>
</tbody>
</table>
<h2>Conclusion</h2>
<p>By avoiding these common pitfalls, you can significantly increase the chances of a successful agile project. Remember, effective collaboration, clear communication, and a user-centric mindset are key to delivering valuable outcomes. A Product Owner (PO) is a role, not merely a job. It necessitates training, support, and a readiness to continuously challenge our assumptions.</p>
<p>It's worth noting that a project can fail even with good design when blueprints and good coding practices are not followed, but this is an entirely different topic. However, due to the GIGO effect, no good product can ever be released from a bad design phase.</p>
Make Your Jobs More Robust with Automatic Safety Switches2023-08-28T00:00:00+00:00https://florat.net/make-your-jobs-more-robust-with-automatic-safety-switches/<img src="https://florat.net/assets/images/blog-tech/article-35.jpg" alt="Upside-down house" width="500">
<p>This article has also been published at <a href="https://dzone.com/articles/make-your-jobs-more-robust-with-automatic-safety-s">DZone</a>.</p>
<p>In this article, I'll refer to a 'job' as a batch processing program, as defined in <a href="https://jcp.org/en/jsr/detail?id=352">JSR 352</a>. A job can be written in any language but is scheduled periodically to automatically process bulk data, in contrast to interactive processing (CLI or GUI) for end-users. Error handling in jobs differs significantly from interactive processing. For instance, in the latter case, backend calls might not be retried as a human can respond to errors, while jobs need robust error recovery due to their automated nature. Moreover, jobs often possess higher privileges and can potentially damage extensive data.</p>
<p>Consider a scenario: What if a job fails due to a backend or dependency component issue? If a job is scheduled hourly and faces a major downtime just minutes before execution, what should be done?</p>
<p>Based on my experience with various large projects, implementing automatic safety switches for handling technical errors is a best practice.</p>
<h2>Enhancing Failure Handling with Automatic Safety Switches</h2>
<p>When a technical error occurs (e.g., timeout, storage shortage, database failure), the job should attempt several retries (as per best practices outlined below) and halt immediately at the current processing step. It's advisable to record the current step position, allowing for intelligent restarts once the system is operational again.</p>
<p>Only human intervention, after thorough analysis and resolution, should reset the switch. While in a disabled state, any attempt to schedule the job should log that it's inactive and cannot initiate. This is also the opportune moment to create a <a href="https://sre.google/sre-book/postmortem-culture/">post-mortem</a> report, valuable for future failure analysis and potential adjustments to code or configuration for improved robustness (e.g., adjusting timeouts, adding retries, or enhancing input controls).</p>
<p>The switch can then be removed, enabling the job to recommence or complete outstanding steps (if supported) during the next scheduled run. Alternatively, immediate execution can be forced to prevent prolonged downtime delays, especially if job frequency is low. Delaying a job's execution excessively can lead to end-user latency and potential accumulation of such delays, eventually overwhelming the job's capacity.</p>
<h2>Rationale for Automatic Safety Switches</h2>
<ul>
<li>
<p><strong>Prevention of Data Corruption</strong>: They can avert significant data corruption resulting from bugs by halting activity during unexpected states.</p>
</li>
<li>
<p><strong>Error Log Management</strong>: They help prevent system flooding with repetitive error logs (such as database access error stack traces). Uncontrolled log volumes might also exacerbate issues like filesystems filling.</p>
</li>
<li>
<p><strong>Facilitating System Repair</strong>: A system without an automatic safety switch significantly complicates the diagnostic and fixing process. Human operators cannot make decisions with clarity since the system remains enabled and could potentially jam again as soon as it's scheduled."</p>
</li>
<li>
<p><strong>Resource Exhaustion Mitigation</strong>: Continuing periodic jobs during technical errors caused by resource exhaustion (memory, CPU, storage, network bandwidth, etc.) worsens the situation. Automatic safety switches act as circuit breakers, stopping jobs and freeing up resources. After resolving the root problem, operators can restart jobs sequentially and securely.</p>
</li>
<li>
<p><strong>Security Enhancement</strong>: Many attacks, including brute force attacks, SQL injections, or Server Side Injection (SSI), involve injecting malicious data into a system. Such data might be processed later by jobs, potentially triggering technical errors. Stopping the job improves security by forcing human or team analysis of the data. Similarly, halting a job after a timeout can help foil a resource exhaustion-type attack, such as a ReDOS (Regular Expression Denial of Service).</p>
</li>
<li>
<p><strong>Promoting System Analysis</strong>: Organizations that overlook job robustness often allow failed jobs to run in subsequent schedules, adopting a risky approach. Automatic safety switches necessitate human intervention, detecting every failure. This encourages systematic analysis, post-mortem documentation, and long-term improvements.</p>
</li>
<li>
<p><strong>Preventing Excessive Costs</strong>: Implementing a throttling mechanism that pauses operations upon hitting predetermined thresholds, along with an automated safety feature that requires analysis, can protect organizations from incurring significant additional costs due to bugs or intentional attacks when interacting with external systems that incur charges.</p>
</li>
<li>
<p><strong>Code Reuse</strong>: Besides emergency handling, the code written for this purpose can be repurposed to disable a job without altering the scheduling. This is similar to the <code>Suspend: true</code> attribute in Kubernetes CronJobs. In a recent project, we utilized this functionality to conveniently initiate job maintenance. By setting the stop flag, the maintenance script then awaits the completion of all jobs.</p>
</li>
</ul>
<h2>Implementing Effective Safety Switches</h2>
<ul>
<li>
<p><strong>Simple Implementation</strong>: The most straightforward approach involves each job, during scheduling, checking for a persistent <code>stop</code> flag. If present, the job exits with a log. The flag can be implemented, for example, through a file, a database record, or a REST API result. For robustness, a <code>stop</code> file per job is preferable, containing metadata like the reason for stopping and the date. This flag is set on technical errors and removed only by a human operator's initiative (using commands like <code>rm</code> or more advanced methods like a shell script for instance).</p>
</li>
<li>
<p><strong>Coupling with Retrying Mechanism</strong>: Safety switches must work alongside a robust retry solution. Jobs shouldn't halt and require human intervention at the first sign of intermittent issues like database connection saturation or occasional timeouts due to backups slowing down the SAN. Effective systems, such as the Spring Retry library, incorporate exponential backoff with jitter. For instance, setting 10 tries, including the initial call, results in retries spaced exponentially apart (1-second interval, then 2 seconds, and so on). This entire process spans 10 to 15 minutes before failing if the root cause isn't resolved within that timeframe. Jitter introduces small random intervals to avoid retry storms where all jobs simultaneously retry.</p>
</li>
<li>
<p><strong>Ensure Exclusive Job Launches</strong>: Like any batch processing solution, guarantee that jobs are mutually exclusive—ensuring a new job isn't launched while a previous instance is still running.</p>
</li>
<li>
<p><strong>Business Error Handling</strong>: Business errors (e.g., poorly formatted data) shouldn't trigger safety switches, unless the code lacks defensive measures and unexpected errors arise. In such cases, it's a code bug and qualifies as a technical error, warranting the safety switch trigger and requiring hotfix deployment or data correction.</p>
</li>
<li>
<p><strong>Facilitate Smooth Restarts</strong>: When possible, allow seamless restarts using batch checkpoints, storing the current step, processing data context, or even the presently processed item.</p>
</li>
<li>
<p><strong>Monitoring and Alerting</strong>: Ensure that monitoring and alerting systems are aware of job stoppage triggered by automatic safety switches. For example, email alerts could be sent or jobs could be highlighted in red within a monitoring system.</p>
</li>
<li>
<p><strong>Semi-automatic Restarts</strong>: While we always advocate for thorough system analysis during production issues, there are moments when having jobs halted for human intervention isn't practical, especially during weekends. A middle-ground solution between routine automatic job restarts and a complete halt is to authorize an automatic restart after a predetermined period. In our scenario, we've set a mechanism to remove the stop flag after 8 hours. This allows the job to try restarting if no human intervention has addressed the issue by then. This approach merges the benefits of an automatic safety switch, such as preventing data corruption or log overflow, with certain drawbacks. For instance, it might overlook the importance of a systematic analysis and the resulting continuous improvement. Hence, we believe this solution should be implemented judiciously.</p>
</li>
</ul>
<h2>Conclusion</h2>
<p>Automatic safety switches prove invaluable in handling unexpected technical errors. They significantly reduce the risk of data corruption, empower operators to address issues thoughtfully, and foster a culture of post-mortems and robustness improvements. However, their effectiveness hinges on not being overly sensitive, as excessive interventions can burden operators. Thus, coupling these switches with well-designed retry mechanisms is crucial.</p>
Datasets staticity level2023-07-16T00:00:00+00:00https://florat.net/datasets-staticity-level/<h1>Datasets staticity level</h1>
<p>[Article also published <a href="https://dzone.com/articles/datasets-staticity-levels">on DZone</a>.]</p>
<p>A common challenge when designing applications is determining the most suitable implementation based on the frequency of data changes. Should a status be stored in a table to easily expand the workflow? Should a list of countries be embedded in the code or stored in a table? Should we be able to adjust the thread pool size based on the targeted platform?</p>
<p>In a current large project, we categorize datasets based on their staticity level, ranging from very static to more volatile:</p>
<h2>Level 1 : Very static datasets</h2>
<p>These types of data changes always involve business rules and impact the code. A typical example is the list of states in a workflow (STARTED, IN_PROGRESS, WAITING, DONE, etc.). The indicative size of this dataset is usually between 2 to 20 entries.</p>
<p>From a technical perspective, it is often implemented as an enumeration (a finite list of literal values like Enumerated Types in PostgreSQL, enums in Java, or TypeScript, for instance). Alternatively, it can be managed as constants or a list of constants.</p>
<p>You can use the following litmus test: "Does any item from this list need to be included in an 'if' statement in the code?".</p>
<p>Changing this type of data requires a new release and/or a Data Definition Language (DDL) change and is not easily administrable.</p>
<h2>Level 2: Rarely changing datasets</h2>
<p>Think of datasets like a list of countries/states or a list of currencies. These datasets rarely exceed a few tens of entries. We refer to them as "nomenclatures".</p>
<p>From a technical standpoint, they can be managed using a configuration file (JSON/YAML/CSV/properties, etc.) or within a database (a table if using a relational database like PostgreSQL, a document or a list of documents if using a NoSQL Document database like MongoDB, etc.).</p>
<p>It is often a good idea to provide an administration GUI that allows adding, changing, or removing entries of this kind if your budget permits.</p>
<p>These lists are often required to initiate the use of an application, even if the data may change later on. Therefore, it is advisable to package the application with a minimal dataset before its first use. For example, a Liquibase configuration can be released with the application to create a minimal set of countries in the database if it doesn't exist yet. However, be cautious to use an idempotent "CREATE IF NOT EXIST" scheme to avoid conflicting with preexisting data.</p>
<p>Depending on the packaging and technologies used, a change in this type of data may or may not require a new release. If your application includes a mechanism for embedding a minimal dataset (such as a configuration file or a Liquibase or SQL script executed automatically), it will likely require a new release. While this may initially be seen as a constraint, it ensures that your application is self-contained and always operational from its deployment, which is often worthwhile.</p>
<p>When storing nomenclatures in a database, a common strategy is to create a table for each nomenclature (e.g., a table for currencies, a table for countries). If, like us, your application requires a more flexible approach, you can use a single NOMENCLATURE table for each microservice and differentiate the nomenclatures using a simple column (e.g., a NOMENCLATURE name). All nomenclatures are then consolidated in a single technical table, and it is straightforward to retrieve a specific nomenclature using a WHERE clause on the nomenclature name. If you want to maintain an ordering, you can further enhance this approach by assigning an ordinal value to each nomenclature entry.</p>
<h2>Level 3: Volatile datasets</h2>
<p>Most applications persist large amounts of data, which we refer to as "volatile data". This type of data can involve an unlimited number of records managed by an application, such as user profiles, addresses, or chat discussions.</p>
<p>A change, addition, or removal of a record in this kind of dataset should never require a new release (although backups are still necessary). The code is generally designed to handle such changes in a generic manner rather than on a case-by-case basis.</p>
<p>This type of data is typically not administrable through code changes but is managed through regular front/back-office GUIs or batch programs.</p>
<h2>Summary</h2>
<p>Choosing the appropriate level of staticity is crucial to ensure the maintainability and modifiability of an application and can help avoid potential pitfalls. Using an incorrect solution to handle a particular staticity level can lead to unnecessary integration and release tasks or make the application less maintainable.</p>
<table>
<thead>
<tr>
<th>Level</th>
<th>Change frequency</th>
<th>Indicative size</th>
<th>Administrable?</th>
<th>Change requires a new release?</th>
<th>Technical solution examples</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>low</td>
<td>2-20</td>
<td>no</td>
<td>yes</td>
<td>List of constants, Java enum, Enumerated PostgreSQL type</td>
</tr>
<tr>
<td>2</td>
<td>medium</td>
<td>10-100</td>
<td>yes</td>
<td>Depends on choosen solution</td>
<td>Nomenclature table, configuration file</td>
</tr>
<tr>
<td>3</td>
<td>high</td>
<td>> 100</td>
<td>no</td>
<td>no</td>
<td>Regular database records</td>
</tr>
</tbody>
</table>
Make great architecture diagrams with C4 and Plantuml (2/2)2022-08-10T00:00:00+00:00https://florat.net/make-great-architecture-diagrams-with-c4-and-plantuml-(22)/<pre>
. Is narrow-scoped only to a given
Best practices (voir cours)
numbering
use characters
⌨ r
- min constraints
- Use Lay_Distance(x,y,<taille>) à l'intérrieur d'une zone
- Eviter les Lay_U...
- Utiliser des sprites
skinparam linetype poluline
infra:
https://c4model.com/#DeploymentDiagram
https://sarafian.github.io/tips/2021/03/11/plantuml-tips-tricks-1.html
adding hidden lines a -[hidden]- b
extending the length of a line a --- b (more dashes, longer line)
specifying preferred direction of lines (a -left- b)
swapping association ends (a -- b → b -- a)
changing the order of definitions (the order does matter... sometimes)
adding empty nodes with background/border colors set to Transparent
https://crashedmind.github.io/PlantUMLHitchhikersGuide/index.html
https://crashedmind.github.io/plantuml.github.io/
a -[norank]-> b
Rempalcer x fleches par une seule de niveau VM
Le - de Lay , custom, de temps en temps, tenter sans
lignes de couleurs/pleine/dashed..
BackendClient -[hidden]- Logging
AddRelTag("backup", $textColor="orange", $lineColor="orange", $lineStyle = DashedLine())
Separation static/dynamic + imports
1) jouer sur le layout
LAYOUT_LEFT_RIGHT()
2) Jouer sur les Rel_U...
'vers api et queues
Rel_U(rece_requetehubee_batch, api_hubee, "HTTPS")
Une seule modif à la fois
numeroter les liens
</taille></pre>Architecture as Code with C4 and Plantuml2022-06-10T00:00:00+00:00https://florat.net/architecture-as-code-with-c4-and-plantuml/<h1>Architecture as Code with C4 and Plantuml</h1>
<img src="https://florat.net/assets/images/blog-tech/28-diag-4.svg" alt="Illustration">
<p>(This article has also been <a href="https://dzone.com/articles/architecture-as-code-with-c4-and-plantuml">published</a> at DZone)</p>
<h2>Introduction</h2>
<p>I'm lucky enough to currently work on a large microservices-based project as a solution architect. I'm responsible for designing different architecture views, each targeting very different audiences, hence different preoccupations:</p>
<ul>
<li>The <strong>application view</strong> dealing with modules and data streams between them (targeting product stakeholders and developers)</li>
<li>The <strong>software view</strong> (design patterns, database design rules, choice of programming languages, libraries...) that developers should rely upon;</li>
<li>The <strong>infrastructure view</strong> (middleware, databases, network connections, storage, operations...) providing useful information for integrators and DevOps engineers;</li>
<li>The <strong>sizing view</strong> dealing with performance;</li>
<li>The <strong>security view</strong>, which is mainly transversal.</li>
</ul>
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<p>We use this <a href="https://github.com/bflorat/architecture-document-template">Open Source Template</a> to document our architecture.</p>
<hr>
<p>Our current project architecture is fairly complex because of the number of modules (tens of jobs, API and GUI modules), because of the large number of external partners and because of its integration with a large legacy information system.</p>
<p>At this time, we have to maintain more than one hundred architecture diagrams. Following a <a href="https://leanpub.com/livingdocumentation">living documentation</a> approach, we adapt and augment diagrams, text and tables several times a day. As we will see later, it's often a collaborative process taking advantage of several great tools.</p>
<h3>The Sample Application</h3>
<p>We illustrate this article with a fictional <em>AllMyData</em> microservices application. This is a .gov web application enabling any company to get all its information known to all the public administrations.</p>
<p>We can split our feature "Deliver Companies Data" into two main call chains:</p>
<ul>
<li>A first call chain is made of the GUI requests that create requests into the system.</li>
<li>A second one is made of a job launched periodically and consuming new requests. It gathers data about the company both from a local repository and from another administration IS (Information System), produces a PDF report and sends an e-mail to the company original requester.</li>
</ul>
<h2>The C4 Model</h2>
<p>We use the <a href="https://c4model.com/">C4 model</a> to represent our architecture. It is beyond the scope of this tooling article to describe it in depth but I invite you to have a look at this very pragmatic approach. I find it very natural to design complex architectures. It leverages the UML2 standard and provides a great dichotomy between high level concerns and code-level ones.</p>
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<p><a href="https://www.opengroup.org/archimate-forum">Archimate</a> could be another good fit for us but probably overkill in our context of very low modelization adoption and knowledge. Also, we like the C4 KISS/low tech approach that takes many human psychological criteria into account. Note that some Archimate tools support C4 diagrams using some mapping between concepts. Not sure it is good idea to mix both though.</p>
<hr>
<p>In our context, we currently use three main C4 diagrams types (note that C4 and UML2 contain others not listed here):</p>
<ul>
<li><strong>System landscape diagrams</strong> provide a very high-level view of the system. We use it to describe the general application architecture.</li>
</ul>
<img src="https://florat.net/assets/images/blog-tech/28-diag-5.png" alt="System landscape sample" width="600/">
<ul>
<li><strong>Container diagrams</strong> are used to describe the middleware, databases, and many other technical components as well as data streams between them. They are similar to UML2 deployment diagrams but more natural in my opinion. In the application view, we mainly display modules and databases and in the infrastructure view, we drill down into technical devices like reverse proxies, load balancers, cluster details, etc. We also use C4 dynamic diagrams, very similar to container diagrams but including call numbering.</li>
</ul>
<img src="https://florat.net/assets/images/blog-tech/28-diag-6.png" alt="System landscape sample">
<ul>
<li><strong>Various UML2 diagrams</strong> (sequence, activity, classes). We use them with parsimony and only to express a pattern or something especially important or complex but certainly not for ordinary code.</li>
</ul>
<p><img src="https://github.com/bflorat/architecture-document-template/raw/master/diagrams/roles.svg" alt=""></p>
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<p>I'm a quite reluctant to use the C4 <em>container</em> term because of the risk of confusion with Docker/OCI containers (as pointed out by Simon Brown, the C4 creator). In our organization, we prefer to call them <em>deployable units</em>. The C4 model encourages terminology adaptation. A C4 container is basically a separated deployable process. The <a href="https://c4model.com/#ContainerDiagram">C4 documentation</a> states: "Essentially, a container is a separately runnable/deployable unit (e.g. a separate process space) that executes code or stores data".</p>
<p>In the C4 model, a <em>container</em> can contain one or more software <em>components</em>. This concept doesn't refer to infrastructure components, but to large pieces of code (like a set of Java classes). We barely use C4 components in our architecture document because we don't really need to go into that level of details (our hexagonal architecture makes things simple to design and understand just by reading the code and our agile approach makes us prefer limiting the design documentation we have to maintain).</p>
<hr>
<h2>Plantuml</h2>
<p><a href="https://plantuml.com/en/">Plantuml</a> is an impressive tool that generates instantly diagrams from a very simple textual DSL (Domain Specific Language).</p>
<p>For instance, this very short text:</p>
<pre><code>@startuml
[Browser] -> [API Foo]: HTTPS
@enduml
</code></pre>
<p>...is enough to produce this diagram:</p>
<img src="https://florat.net/assets/images/blog-tech/28-diag-7.png" alt="Plantuml diagram sample">
<p>Plantuml comes with hundreds of features and syntax goodies, sometimes undocumented and evolving very quickly. I suggest <a href="https://plantuml-documentation.readthedocs.io/en/latest/">this website</a> as a clear and exhaustive documentary reference.</p>
<p>Check out some real-world examples <a href="https://real-world-plantuml.com/">here</a>.</p>
<h3>Plantuml Combined With C4</h3>
<p>Plantuml component diagrams can be customized as C4 diagrams using <a href="https://github.com/plantuml-stdlib/C4-PlantUML">this extension library</a>.</p>
<p>Just import it at the top of your Plantuml diagrams and use C4 macros:</p>
<pre><code>@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
!include <tupadr3/devicons2/chrome>
!include <tupadr3/devicons2/java>
!include <tupadr3/devicons2/postgresql>
LAYOUT_LEFT_RIGHT()
Container(browser, "Browser","Firefox or Chrome", $sprite="chrome")
Container(api_a, "API A","Spring Boot", $sprite="java")
ContainerDb(db_a, "Database A","Postgresql", $sprite="postgresql")
Rel(browser,api_a,"HTTPS")
Rel_R(api_a,db_a,"pg")
@enduml
</code></pre>
<p>is exported as:</p>
<img src="https://florat.net/assets/images/blog-tech/28-diag-4.svg" alt="Plantuml diagram sample">
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<ul>
<li>
<p>Always export diagrams in SVG format to allow unlimited zooming. It is a appreciable when dealing with large diagrams.</p>
</li>
<li>
<p>We use here the online latest version but you may prefer to use a static downloaded version in an air-gap mode.</p>
</li>
</ul>
<hr>
<h3>Diagrams Factorization</h3>
<p>A great thing about Plantuml is the factorization capabilities using the <code>!include</code> and <code>!includesub</code> <a href="https://plantuml.com/en/preprocessing">preprocessor</a> directives.</p>
<p>It is possible to include local or remote diagrams (ie. starting with <code>@startuml</code> and ending with the <code>@enduml</code> directive). For instance, C4 macros are included using this instruction:</p>
<pre><code>!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
</code></pre>
<p>More interestingly, it is also possible to import diagram fragments (ie. starting with <code>!startsub</code> and ending with the <code>!endsub</code> directive):</p>
<p>File <code>fragments.iuml</code>:</p>
<pre><code>!startsub dmz
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
!include <tupadr3/devicons2/chrome>
!include <tupadr3/devicons2/java>
Container(browser, "Browser","Firefox or Chrome", $sprite="chrome")
Container(api_a, "API A","Spring Boot", $sprite="java")
!endsub
!startsub intranet
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
!include <tupadr3/devicons2/postgresql>
ContainerDb(db_a, "Database A","Postgresql", $sprite="postgresql")
!endsub
!startsub extranet
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
!include <tupadr3/devicons2/postgresql>
ContainerDb(db_b, "Database B","Postgresql", $sprite="postgresql")
!endsub
</code></pre>
<p>File <code>diags-1.puml</code>:</p>
<pre><code>@startuml use-case-1
' We only include context-related sub-diagams
!includesub fragments.iuml!dmz
!includesub fragments.iuml!intranet
Rel(browser,api_a,"HTTPS")
Rel_R(api_a,db_a,"pg")
@enduml
</code></pre>
<h3>Filtering Unlinked Containers</h3>
<p>Since mid-2020, Plantuml supports a game-changing <a href="https://forum.plantuml.net/11052/remove-unlinked-components">feature</a> for software architects: the <code>remove @unlinked</code> directive. It <strong>only keeps from a C4 diagram the containers calling or being called and drop any other</strong>.</p>
<p>This feature (along with the diagram fragments capacities) was a requirement to achieve the diagram patterns described below.</p>
<h3>Sprites</h3>
<p>Thousands of <a href="https://github.com/tupadr3/plantuml-icon-font-sprites/blob/master/devicons/index.md">sprites</a> are available to decorate the C4 containers. They are now embedded directly into the last Plantuml releases. They include Devicons, Font-Awesome, Material, Office, Weather and many other icon libraries. Most software, hardware, network and business-oriented icons are ready to use out of the box!</p>
<p>From my experience, <strong>using sprites inside C4 containers makes the diagrams airier</strong> and thus more pleasant to read. Maybe does it help our brain to identify faster the nature of each container?</p>
<p>Note that even if you can use different background colors to differentiate C4 containers based on a specific criteria (for instance, I use a light grey for external APIs), we recommend using sprites instead to represent nature as it makes cleaner diagrams and the default blue color is fine in most of the cases.</p>
<h3>Plantuml IDE Plugins</h3>
<p>Plantuml is a very versatile technology that can be used in many <a href="https://plantuml.com/running">different contexts</a> including:</p>
<ul>
<li>
<p>A simple base64 encoded URL like <code>https://www.plantuml.com/plantuml/uml/SoWkIImgAStDuL9GK8XsAielBqujYbNGjLE8TWpmL73BpuzLi5Bm20a92EPoICrB0Qe40000</code>;</p>
</li>
<li>
<p>Inside a Word processor like LibreOffice or Word;</p>
</li>
<li>
<p>From programming languages like Groovy, Java or Python;</p>
</li>
<li>
<p>In most IDE like Intellij IDEA thanks to <a href="https://plugins.jetbrains.com/plugin/7017-plantuml-integration">this plugin</a>;</p>
</li>
<li>
<p>Or in Eclipse with <a href="https://github.com/hallvard/plantuml">this plugin</a>;</p>
</li>
<li>
<p>But my own favorite is the <a href="https://marketplace.visualstudio.com/items?itemName=jebbs.plantuml">VScode plugin</a>. Among other features, it supports multi-diagrams generation from a single <code>.puml</code> file and multi-diagrams/multi-puml files diagrams generation. It can be finely tuned.</p>
</li>
</ul>
<img src="https://florat.net/assets/images/blog-tech/vscode-plantuml.png" alt="VSCode Plantuml plugin" width="600">
<h2>Architecture as Code</h2>
<p>A very nice side-effect of the IDE Plantuml integration is the fact that you can not only create diagrams much faster by being released from the arrangement chore but also write them as you code. Diagrams can be automatically generated and refreshed as you type.</p>
<h3>Mob Designing</h3>
<p>This kind of tooling enables what I would call <em>Mob design</em>. Especially at the beginning of our project but still currently, we used to brainstorm about the software architecture. Using Plantuml and a large shared screen, it is very convenient to create and compare several architecture scenarios.</p>
<p>"What if the API <code>A</code> is called directly by the client <code>B</code>?" Or "Should it be called asynchronously by the job <code>J</code>?" ...</p>
<p>In the same manner that end-users truly need to visualize screen mockups, developers and architects think better in front of diagrams. This also greatly limits misunderstandings induced by the limitation and numerous ambiguities of natural languages.</p>
<h3>Inventory and Dependencies Diagrams</h3>
<p>As a blueprint we use the <code>!include</code> and/or <code>!includesub</code> directives to separate:</p>
<ul>
<li>
<p><strong>Inventory diagrams</strong> show static elements of the architecture (classified into different network zones and represented by boundaries) but don't display <em>relations</em> between them. They are useful to respond to questions like "What contains zone <code>xyz</code>?" or "Which modules cover system <code>xyz</code>?"). It is particularly useful in the application view to clearly display systems modules of complex microservices architectures or in the infrastructure views to represent nodes in each network zone and their deployable units. This kind of diagram uses <strong>C4 container diagrams</strong>.</p>
</li>
<li>
<p><strong>Dependencies diagrams</strong> leverage the static diagrams but augment them with calls between the containers. Inventory diagrams can be used alone but dependencies diagrams have to import the inventory diagram. It should respond to questions like "Which module/container is called by X" or "Which modules/container does X call ?". It is also helpful for impact studies: "What's the impact if I change <code>API X</code>?".</p>
</li>
</ul>
<p>Example of an inventory diagram:</p>
<p>File <code>inventory.puml</code>:</p>
<pre><code>@startuml
header Inventory diagram
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
!include <tupadr3/devicons2/chrome>
!include <tupadr3/devicons2/java>
!include <tupadr3/devicons2/postgresql>
!include <tupadr3/devicons2/nginx_original>
!include <tupadr3/devicons2/react_original>
!include <tupadr3/devicons2/android>
!include <tupadr3/devicons2/groovy>
!include <tupadr3/material/queue>
!include <tupadr3/material/mail>
!include <tupadr3/devicons2/dot_net_wordmark>
!include <tupadr3/devicons2/oracle_original>
!include <office/Concepts/web_services>
skinparam linetype polyline
HIDE_STEREOTYPE()
SHOW_PERSON_PORTRAIT()
System(client, "Client") {
Container(spa, "SPA allmydata-gui", "Container: javascript, React.js", "Graphical interface for requesting information", $sprite="react_original")
Container(mobile, "AllMyData mobile application", "Container: Android", "Graphical interface allowing to request information", $sprite="android")
}
Enterprise_Boundary(organisation, "System organisation B") {
Container_Ext(saccounting, "Accounting system", "REST service", $sprite="web_services")
}
Enterprise_Boundary(si, "Information System") {
Container(static_resources, "allmydata-gui Web Application", "Container: nginx", "Delivers static resources (js, html, images ...)", $sprite="nginx_original")
Container(sm, "allmydata-api", "Container: Tomcat, Spring Boot", "REST service allowing to request information", $sprite="java")
Container(crep, "Companies repository", "Container", "SOAP webservice providing data about companies known by administration A", $sprite="dot_net_wordmark")
ContainerDb(crep_db, "companies-repository-db", "Container: SqlServer", "Stores companies data",$sprite="oracle_original")
Container(batch, "allmydata-batch", "Container: groovy", "Process requests, launched by cron every minute", $sprite="groovy")
ContainerQueue(queue, "requests-queue", "Container: RabbitMQ", "Stores requests", $sprite="queue")
ContainerDb(amd_db, "allmydata-db", "Container: PostgreSQL", "Stores requests history and status",$sprite="postgresql")
Container(sreporting, "service-reporting-pdf", "Container: Tomcat, JasperReport", "Reporting REST service", $sprite="java")
Container(smails, "mail server", "Container: Postfix", "Send emails", $sprite="mail")
}
@enduml
</code></pre>
<img src="https://florat.net/assets/images/blog-tech/28-diag-1.png" alt="Inventory diagram sample" width="600">
<p>Example of dependency diagram (importing its inventory counterpart and adding a person and a bunch of calls):</p>
<p>File <code>dependencies.puml</code>:</p>
<pre><code>@startuml dependencies
header Dependencies diagram
!include inventory.puml
Rel(client, static_resources, "HTTPS")
Rel(spa,sm,"REST call","HTTPS")
Rel(sm,queue,"AMQP")
Rel(sm,amd_db,"psql")
Rel(batch, queue, "AMQP")
Rel_R(batch, saccounting, "HTTPS")
Rel(batch, sreporting,"HTTP")
Rel(batch, smails, "SMTP")
remove @unlinked
@enduml
</code></pre>
<img src="https://florat.net/assets/images/blog-tech/28-diag-2.png" alt="Dynamic diagram sample" width="800">
<h3>Dynamic Diagrams to Describe Call Chains</h3>
<p>Once we have provided the system big picture using both an inventory and dependencies view, we describe the detailed architecture of each main feature using a third kind of C4 diagram: <strong><a href="https://c4model.com/#DeploymentDiagram">C4 dynamic diagrams</a></strong>. C4 container and dynamic diagrams are very similar but the latter comes with automatic call numeration.</p>
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<ul>
<li>
<p>Some may prefer good old UML2 sequence diagrams for complex interactions. In most cases, I find the C4 dynamic diagrams easier to read when dealing with container interactions.</p>
</li>
<li>
<p>When working on complex code design, we rather use UML2 sequence diagrams.</p>
</li>
</ul>
<hr>
<p>C4 dynamic diagrams target developers. They detail calls or data streams between C4 containers involved in the context of a given feature, hence providing a <strong>detailed view of each call chain</strong>.</p>
<p>The <em>feature</em> term should be intended in the agile meaning (fulfills a stakeholder need). It can be something like "Allow an enterprise to access its data online" or "Pay for an order".</p>
<p>This kind of diagram can still contain zones or boundaries (already available in the inventory or dependencies diagrams), thus setting up the call chain in a more global context.</p>
<p>The feature architecture leverages one or more call chains and a call chain is made of a group of ordered calls or actions (like calling an API, writing a file on disk, etc.) <strong>all performed synchronously</strong>. Any further call is referenced in the next call chain.</p>
<img src="https://florat.net/assets/images/blog-tech/28-diag-8.png" alt="C4 dynamic diagram sample">
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<ul>
<li>
<p>By 'synchronous', we mean a set of activities sharing the same logical "transaction". A technically asynchronous call (like when using <a href="https://en.wikipedia.org/wiki/Reactive_programming">reactive programming</a>) still applies. On the contrary, in the case where a call chain produces a message as a part of an Event Driven Architecture, this event consumption and computation by another module are NOT counted in the same call chain even if the production and the consumption of the event are technically almost instantaneous.</p>
</li>
<li>
<p>When considered helpful, we augment the diagrams with some textual context (using AsciiDoc) before or after the diagram but this text should be synthetic, not redundant with the diagram itself. Call chain diagrams are however often sufficient in themselves.</p>
</li>
</ul>
<hr>
<p>We leverage inventory diagrams fragments and unlinked container filtering explained before to achieve an effective Architecture As Code pattern.</p>
<p>File call chain <code>deliver-1.puml</code> (note the <code>remove @unlinked</code> usage here):</p>
<pre><code>@startuml deliver-1.puml
!include inventory.puml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Dynamic.puml
' For call chains, we advise to put a header (displayed by default at the upper-right
' side of the diagram) to ease its identification.
header deliver-1
Person_Ext(company, "Company", "[person] \nWeb client (PC, tablet, mobile)")
Rel(client, static_resources, "Visit https://allmydata.gouv", "HTTPS (R)")
Rel(client, spa, "Retrieves information via")
Rel(spa,sm,"REST call","HTTPS (W)")
RelIndex(LastIndex()-1,sm,queue,"Produces a request message to the queue","AMQP (W)")
RelIndex(LastIndex()-2,sm,amd_db,"Stores the request data","JDBC (W)")
increment()
' Remove all C4 containers imported from inventory.puml file but not involved
' in this call chain to make the diagram much cleaner
remove @unlinked
@enduml
</code></pre>
<img src="https://florat.net/assets/images/blog-tech/28-diag-9.svg" alt="C4 dynamic diagram sample">
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<p>It is <strong>paramount to standardize call chains naming</strong> (like <code>deliver-1</code>, <code>pay-3</code>, ...) because it becomes a strong vector of communication between developers and business analysts. It is then possible to talk using canonical names like <code>deliver-1 3-1</code> for instance. This is a massive misunderstanding killer, time saver and is one of the main benefits of this methodology.</p>
<p>I suggest to simply using the <code><feature>-<incrementing number></code> naming scheme.</p>
<hr>
<p>File call chain <code>deliver-2.puml</code> (note the 'remove @unlinked' usage here):</p>
<pre><code>@startuml deliver-2.puml
!include inventory.puml
header deliver-2
Rel(sm,amd_db,"JDBC CRUD calls","psql")
Rel(batch, queue, "Consume each request message", "AMQP (R)")
Rel(batch, amd_db, "Read various very interesting data about the requester company", "JDBC (R)")
Rel(batch, saccounting, "Get more interesting data from the Accounting system", "HTTPS (R)")
Rel(batch, sreporting, "Produces a great PDF including great pie charts", "HTTP (W)")
Rel(batch, smails, "Send an e-mail to original requester with the attached PDF", "SMTP (W)")
Rel(batch, amd_db, "Store the request data (date, final status...)", "JDBC (W)")
remove @unlinked
@enduml
</code></pre>
<img src="https://florat.net/assets/images/blog-tech/28-diag-10.svg" alt="C4 deliver-2 diagram">
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<p>Each call should detail used network protocols along with a modifier flag (<code>R</code>: Read, <code>W</code>: Write, <code>E</code>:Execute). These flags are important to figure out the call intention. More than a single flag on the same call is possible.</p>
<hr>
<p>In our context, these call chain diagrams provide enough architectural details to code the application. They are the only design documentation we write before actually coding. Apart from them, the real (and best) documentation is the (clean) code itself.</p>
<h2>Conclusion</h2>
<p>I hope this introduction has aroused your curiosity about coding architectures using Plantuml and C4. A future article will provide our diagramming best practices and some Plantuml useful tips in an architectural context, keep in touch!</p>
<p>I will finish with a personal feeling I can't formally demonstrate but observed many times: <strong>the graphical "harmony" of an architectural diagram is directly proportional with its intrinsic quality</strong>. It is, therefores possible to form a first opinion of complex architecture with just a glimpse of the main diagram on the wall...</p>
<p>In the same order of ideas, <strong>dependencies diagrams highlight the strategic modules and reflect the balances of power hidden behind the architecture</strong> (as expected by the <a href="https://en.wikipedia.org/wiki/Conway%27s_law">Conway's Law</a>).</p>
Designing Human-Targeted Random IDs2022-04-10T00:00:00+00:00https://florat.net/designing-human-targeted-random-ids/<h1>Designing Human-Targeted Random IDs</h1>
<p>Article also published <a href="https://dzone.com/articles/designing-human-targeted-random-ids">on DZone</a>.</p>
<hr>
<p><strong>ℹ️ NOTE</strong>
We don't deal here with technical ID used as primary keys in relational databases. See my previous article <a href="https://florat.net/how-to-do-uuid-as-primary-keys-the-right-way">here</a> if you seek a great way to generate them.</p>
<hr>
<h2>Context</h2>
<p>During one of my recent projects, I have been asked to design a scheme of IDs highly usable by humans. The business requirement was mainly to create pseudo-random values that <strong>can't be inferred or guessed</strong> in order to be used as a secret token printed on some official documents for future controls.</p>
<p>Later on, we had a similar requirement with lower security concerns: generating <strong>human-readable file numbers</strong> that can be printed on associated documents, verbalized on phone or typed when doing searches.</p>
<p>Another well-known example (in France at least) is the ID (aka “SNCF number”) attached by the French railway company with each train travel so one can open easily any travel details from your smartphone without being fully authenticated.</p>
<h2>Main Criteria</h2>
<p>After having compared existing solutions and analyzed the business stakeholder's requirements, these criteria emerged:</p>
<ul>
<li>
<p>These IDs have to be <strong>short</strong> to be easily typed, read or verbalized on phone by a human (no more than six to ten characters).</p>
</li>
<li>
<p>They have to integrate systems that <strong>prevent and detect typos</strong>.</p>
</li>
<li>
<p>They <strong>don't have to be unique</strong> (and can't because of their small size and thus variability). However, the system has to prevent collisions either by coupling these IDs with some others values (like a person last name) or by retrying another attempt when a shuffle value already exists (the solution we use). You’ll have to remind that closed items may own the same ID (when doing search by ID, for instance, make sure to make status into account).</p>
</li>
<li>
<p>When possible, avoid generating offending terms or acronyms (like F*** in English). We didn't actually searched for a solution so far but maintaining a dictionary per targeted language seems the best guess (thanks for Rumen Dimov for his feedback).</p>
</li>
</ul>
<h2>How To Make These Values Truly Usable?</h2>
<ul>
<li>
<p><strong>Limit the number of possible characters</strong> by using more than base-10 (decimal) numbers but add lowercase and uppercase letters. Avoid using others characters (punctuation marks, diacritics,...) that are more difficult to read. Hence, in theory, we can generate numbers made of up to 10 digits + 26 lowercase ASCII letters + 26 uppercase ASCII letters = base-62 numbers.</p>
</li>
<li>
<p>Ease typing and reading as much as possible: the number should be composed of <strong>no more than four or five characters</strong> easily memorized as a whole, like <code>aGty3</code>. If longer, split the ID using hyphens (avoid underscores that could be difficult to read when used as an hyperlink).</p>
</li>
<li>
<p>Make sure that these values can be <strong>easily pasted</strong> using a single command into clearly separated text fields.</p>
</li>
</ul>
<h2>How To Prevent And Detect Typos ?</h2>
<ul>
<li>
<p><strong>Exclude <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541865/">confusing characters</a></strong>. Keep in mind that the similarity depends as well on the used fonts: a 'l' can be easily distinguished from a '1' when using a plain old monotype font but less when using a sans-serif one. We advise excluding the most problematic cases: 'O' and '0' (zero), 'Z' and '2' or 'l' and '1'. By dropping these characters, we now deal with base-56 numbers.</p>
</li>
<li>
<p>Reserve some bits as a <strong><a href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check">CRC</a> or checksum</strong> in order to detect most typos early on the frontend. Such systems are used by banks for decades on IBAN account for instance (using the MOD97 algorithm). Users will thank you for notifying them early and this GUI-side surface control prevents issuing some useless server-side queries and ugly error logs on the backend.</p>
</li>
</ul>
<hr>
<p><strong>ℹ️ NOTE</strong>
Some light CRC solution can’t detect all but most of the possible typos.</p>
<hr>
<h2>What About The Security ?</h2>
<ul>
<li>
<p>If these human-readable IDs are used in serious maters dealing with money, security or official documents, make sure to use a <strong>cryptographically secure pseudorandom number generator</strong> (CSPRNG) to generate the numbers that you will then convert to your base-56 value. For instance, when using a Linux server, make sure to use <code>/dev/random</code> and not <code>/dev/urandom</code>. This will greatly reduce the risk of collisions (the fact of generating twice the same value in a short amount of time).</p>
</li>
<li>
<p>The ID <strong>length should be proportional with the required difficult to guess it</strong>.</p>
</li>
</ul>
<h2>Some Examples Please</h2>
<p>Imagine you want only want to avoid '0'/'O' and '1'/'l' confusions and you want to generate ID with a collision risk as low as 1/2,6.10¹⁷, you can generate numbers (using a CSPRNG) like:</p>
<p><code>aTy2-5fTk-rp9z</code></p>
<p>or</p>
<p><code>bUD5-64kP-hlA4</code></p>
<p>For less critical use cases, fewer characters may be enough:</p>
<p><code>aTy2-5fTk</code></p>
<p>or</p>
<p><code>64kP-hlA4</code></p>
<p>For short-live and low-risk ID, see what SNCF does for travel files (only six capital letters):</p>
<p><code>XSDTGE</code></p>
<h2>Conclusion</h2>
<p>Generating readable random IDs for human can be easily achieve but a bunch of requirements must be taken into account. Their <strong>scheme has to vary according to the targeted usage</strong> but keep in mind that <strong>changing an existing scheme is cumbersome</strong> and can require maintaining several IDs schemes during a long time. I hope that this article will help you to think about the not-so-obvious criteria making it easier to design them right at the first attempt. I would be glad to get feedback if I have forgotten important or obvious points.</p>
How to Do UUID as Primary Keys the Right Way2021-12-28T00:00:00+00:00https://florat.net/how-to-do-uuid-as-primary-keys-the-right-way/<p>(This article has been also <a href="https://dzone.com/articles/uuid-as-primary-keys-how-to-do-it-right">published</a> at DZone)</p>
<h1>How to Do UUID as Primary Keys the Right Way</h1>
<p><img src="https://florat.net/assets/images/blog-tech/uuid-1.jpg" alt="UUID"></p>
<p><strong>TL;DR: UUID V4 or its COMB variant are great to mitigate various security, stability, and architectural issues, but be aware of various pitfalls when using them.</strong></p>
<h2>Why Do We Need Technical IDs in the First Place?</h2>
<p>Any properly designed relational database table owns a <strong>Primary Key (PK)</strong> allowing you to <strong>uniquely and stably identify</strong> each record. Even if the primary key can be composite (built of several columns), it is a widespread good practice to dedicate a special column (often named <code>id</code> or <code>id_<table name></code>) to this end. This special column is used to technically identify records and can be used as foreign keys in relations.</p>
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<p><strong>Do not confuse technical</strong> (also named "surrogate") keys <strong>with business keys</strong>. The most important tables (so-called entities in domain-driven design) may contain an alternate human-readable ID column (like the customer ID "<code>G2F6D</code>"). In this article, we will focus only on technical PKs. They should only be processed and readable by machines, not humans.</p>
<hr>
<h3>How to Choose Good IDs</h3>
<p>Good IDs are:</p>
<ul>
<li><strong>Unique</strong> (no collisions). This is enforced by the UNIQUE constraint automatically added by RDBMS on every PK.</li>
<li><strong>Not reusable</strong>. Reusing a deleted row PK is technically possible in relational databases but is a very bad idea because it contributes to generating confusion (for example, an older log file can reference an ID reused in the meantime by a new entity, thus conducting to false deductions).</li>
<li><strong>Meaningless</strong>. Developers should not parse IDs for the wrong reasons. Imagine an ID starting with the current year. Some developers may ignore that a <code>date_creation</code> column exists and will only rely on the PK's first four digits. If the ID format changes or is buggy (because of bad timezones handling for instance), some subtle issues may arise. Even if this is largely discussed, I would <strong>warn against using natural keys as PK altogether</strong>. It <strong>may limit your options in the future</strong>. For example, if you use e-mail as PK, you implicitly forbid modification of it in future releases: never say "never." Another problem with natural keys is the <strong>difficulty of ensuring unicity</strong> due to functional issues even when everything has been done to avoid them. I once worked for a French governmental agency and observed both issues in different projects: 1) Legacy code relied on the first digit of the NIR (social identity number) to get the people type thus ignoring possible type reassignments (though the current type was available as a dedicated column); 2) We recently discovered that this unique ID was not so unique (for example, an ID shared temporarily by several members of an immigrant family or collisions following cities merging). <strong>The real world is just too complex to make any assumption about unicity</strong>.</li>
</ul>
<h3>Which ID Format to Choose?</h3>
<p>The two formats matching these rules are AFAIK:</p>
<ol>
<li>Auto-incremented integers (starting at <code>1</code> or any larger value: <code>1</code>, <code>100</code>, <code>154555</code>). Most RDBMS like PostgreSQL or Oracle provides the <code>SEQUENCE</code> object allowing to auto-increment a value while respecting the ACID principles. MySQL provides the <code>AUTO_INCREMENT</code> attribute.</li>
<li>Using a text-based random UUID V4 (universally unique identifier), also referred to as GUID (globally unique identifier) by Microsoft. Example: <code>9d17210c-2d5f-11ea-978f-2e728ce88125</code>.</li>
</ol>
<p>When working on existing projects, I often observe that PK is designed as auto-incremented integers. Though this may be considered an obvious and no-brainer choice, <strong>it may be a bad idea in the long run...</strong></p>
<p>Let's consider both options. Each argument is provided along with an importance weight <em>(from 1=minor to 5=major)</em>.</p>
<h2>Why You Should Use Auto-Incremented Integers</h2>
<ul>
<li>
<p>[importance: 3] <strong>Known, understood, and simple solution</strong>. Leverages sequences on modern RDBMS. Comes with very little possibility of performance issues due to bad design.</p>
</li>
<li>
<p>[importance: 1] Pretty <strong>easy to read and verbalize</strong> when the number of digits remains reasonable (but, as stated before, technical PK should not be used by humans anyway — use a business key instead).</p>
</li>
</ul>
<h2>Why You Should Avoid Auto-Incremented Integers</h2>
<h3>Risks to Introduce Bugs</h3>
<ul>
<li>[importance 2] In some SGBDR like PostgreSQL, a <code>nextval</code> operation is not truly transactional. In rollback cases, the value is <a href="https://www.postgresql.org/message-id/CAM3SWZQMfR6Zfe3A0Nr4ddko8xZrijAuQQ%3DEcGjGeJSs2piAXA%40mail.gmail.com">still incremented</a>. It is hence possible to <strong>get "holes" (absent values) in PK sequences</strong>. This is not an issue by itself as unicity is preserved but can induce subtle bugs if developers rely on PK to count the number of items instead of using a proper <code>COUNT</code> query.</li>
<li>[importance 2] Likewise, developers may rely on PK values to sort items chronologically instead of using a dedicated <code>date</code> column. In case of sequence reset and ID reuse, this may conduct the <strong>wrong sorts</strong>.</li>
<li>[importance 1] The key can be huge and if developers used an <code>int</code> variable to map the PK instead of a long one, you can encounter <strong>silent overflow errors</strong>. For instance, in Java, if you map a PK to an <code>Integer</code> or primitive <code>int</code> and the PK gets larger than <code>2,147,483,647</code>, the variable will silently map to the opposite (negative) value.</li>
</ul>
<h3>Security Risks When Using Auto-Incremented Integers as PK</h3>
<p>Using them clearly makes your application an easier target:</p>
<ul>
<li>[importance 3] Auto-incremented integers <strong>leak the number of items treated by a unit of time</strong>. A competitor can easily deduce how many sales you made in a month. Or an attacker can get a good idea of how many requests your system is supposed to handle and finely tune a DDOS attack.</li>
<li>[importance 5] Auto-incremented integers are <strong>predictable</strong>. If used in URLs, they become a <strong><a href="https://owasp.org/www-community/attacks/Path_Traversal">traversal directory</a></strong> (also referred to as a "path traversal") <strong>exploit vector</strong>. For example, an HTTP GET URL can easily be forged from a regular URL path (<code>https://.../1234/...</code>) to another one (like <code>https://.../1235/...</code>). If the application implements a proper access management, the attacker would get a <code>403</code> code (as expected) but if it is not the case or if some endpoints have been forgotten, he can get sensitive data. The <strong>defense-in-depth principle promotes several layers of controls, and never relies on a single one.</strong></li>
<li>[importance 4] Similarly, auto-incremented integers make possible <strong>large bulk data downloads</strong> (in case of bad access controls). It is trivial to write scripts scraping on IDs (like a curl inside a for loop in bash).</li>
</ul>
<h3>Architectural Issues</h3>
<ul>
<li>[importance 4] Auto-incremented integers <strong>make the integration of two systems more difficult</strong>. Imagine that your company buys another one and you have to merge an existing customer database into yours using an ETL. If both systems use auto-incremented integers as PK, you will have to avoid collisions by resetting sequences to new highs and not already used values. All foreign keys (FK) will have to be recomputed.</li>
<li>[importance 2] I think that from an architectural viewpoint, <strong>a database should only store data, not create it</strong>. With sequences, we leave to the RDBMS the creation logic.</li>
<li>[importance 1] With sequences, we mix inserts and data generation (<code>insert into ... values nextval('id_seq')</code>) and we have to keep the new value returned by the <code>INSERT</code> clause if we want to use it in the following queries. A creation function returning a value does <strong>not appear very logical</strong> to me. It is also possible to perform a <code>SELECT nextval('id_seq')</code> followed by an <code>INSERT</code> clause. This doesn't appear more logical to me to have to read something to be able to make a creation...</li>
</ul>
<h3>Operations Risks</h3>
<ul>
<li>[importance 1] When using integers as keys for every entity, it is much <strong>easier to confuse an item with another</strong> (for instance confusing <code>request_id=10</code> with <code>article_id=10</code>).</li>
<li>[importance 1] When deleting an item, an operator can <strong>confuse a value with another</strong> (<code>delete ... where id=4</code> instead of <code>delete ... where id=40</code> for instance). This problem doesn't affect UUID as it is virtually impossible to type a matching UUID by chance.</li>
</ul>
<h2>The Other Way: Random UUID</h2>
<p>The alternative approach is to use UUID (RFC 4122, ISO/IEC 9834-8:2005) version 4 or variants.</p>
<h3>UUID Pitfalls</h3>
<ul>
<li><strong>Using UUID V1, V2</strong>: only the V4 (random value) version of UUID is acceptable. UUID based on timestamps (V1, V2) and MAC address may lead to collisions at very high generation frequency (in the same millisecond), but worse, they leak important data (time of the generation and machine identification data). That could help attackers or give bad ideas to developers (see above why IDs should be meaningless).</li>
<li>Using the <strong>wrong database type:</strong> Most modern RDBMS come with a <code>UUID</code> type. In PostgreSQL, a UUID uses 128 bits of storage size, not 288 as we may infer naively from a UUID textual format.</li>
<li><strong>Changing your mind</strong>: if you went with integers, stick with it on existing projects.</li>
<li><strong>Not using a cryptographically-secure pseudorandom number generator</strong> (CSPRNG)<strong>:</strong> you <em>will</em> encounter collisions and create security flaws. When using a low-quality or buggy pseudorandom generator, the collision risk is very high and may occur several times by day or even hour. Under Linux, use <code>/dev/random</code> and not <code>/dev/urandom</code>.</li>
<li><strong>Using a CSPRNG but blocking your application when entropy is exhausted:</strong> If using <code>/dev/random</code> under Linux, a great solution is to use the <a href="https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged">haveged</a> daemon to feed the CSPRNG.</li>
</ul>
<h3>UUID Misconceptions</h3>
<ul>
<li>Using UUID requires that you for collisions. As explained on this <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions">Wikipedia page</a>, <strong>the risk of collision is so infinitesimal that it can be ignored</strong>. There is a collision probability of 50% every 2.71E18 generations (if you generate without stopping 10 IDs per second, you can expect a 50% probability of collision every 8.5 billion years). The sole control I would advise is to correctly trap SQL errors, as a collision would throw a UNIQUE constraint violation error. Any good code would handle this type of technical error and retry anyway. Real-world production databases already throw erratic SQL errors on a regular basis (like <code>ObjectOptimisticLockingFailureException</code> using Hibernate for instance) so the work is probably already done or it should be.</li>
<li>"UUID is more <strong>difficult to read and verbalize.</strong>" As explained before, UUID is by no way meant for humans. Instead, <strong>use additional functional values</strong> for this. When well designed, they would be better than long integers. But UUID is often read by developers as well (when working on test doubles for instance). I observed in several projects that even then, UUID readiness is not an issue and no developers complained about it. We even figured out that transmitting UUID between developers (by instant messaging for instance) is safer than transmitting integers because nobody would type them and <strong>copy/paste prevents typos</strong>.</li>
</ul>
<hr>
<p><strong>ℹ️ NOTE</strong></p>
<p>NoSQL databases do not rely on integers as keys but on UUID (see MongoDB <code>_id</code> or CouchDB <code>id</code> attributes on documents for instance). This is due to their distributed nature, but I have never heard developers complain about it.</p>
<hr>
<h3>UUID V4 Advantages</h3>
<ul>
<li>[importance 5] Ensures a total <strong>non-significance</strong>. URLs containing PK are totally unpredictable. This <strong>prevents various exploits</strong> like path traversal or mass data downloads.</li>
<li>[importance 3] Greatly <strong>reduces the complexity of integration between databases</strong>.</li>
<li>[importance 2] <strong>Prevents all potential bugs and operations errors</strong> listed above.</li>
<li>[importance 1] No more sequence required: the business code can generate UUID by itself <strong>without using the database</strong>.</li>
<li>[importance 2] The <strong>code is easier to test</strong> because it is trivial to mock UUID without any RDBMS and their sequences features.</li>
<li>[importance 2] Most <strong>languages, frameworks, and tools support them</strong>.</li>
</ul>
<h3>UUID Real But Negligible Issues</h3>
<ul>
<li>UUID uses <strong>more space on disks and in memory</strong> (buffers). On most databases, a long one uses 64 bits whereas a UUID takes 128 bits. On a large database, it would only add 8MB every one million items.</li>
<li>There can be an impact on <strong>INSERT latency</strong>. Inserting one million lines against a PostgreSQL database takes about 25 seconds using UUID V4 and 6 with integers. This is noticeable only by very write-oriented workloads.</li>
<li>SQL queries require <strong>more CPU cycles</strong> to be performed because of the key size (two cycles for 128 bits vs a single one for 64 bits integers). In practice, the overhead is negligible.</li>
<li>In some very seldom cases (when containing only digits), UUID could be confused by badly parameterized WAF (web application firewall) with <strong>credit card numbers</strong>. Think about it when using F5 ASM for instance.</li>
</ul>
<h3>UUID V4 Real Issues and How to Fix Them</h3>
<ul>
<li>[importance 3] The UUID V4 looks fairly simple to implement but <strong>requires a minimal amount of skills and knowledge</strong>. If your team lacks tech leaders/software architects and has no idea of how to get a good source of randomness or of the difference between UUID versions, go for auto-incremented integers — it may save you from painful refactorings.</li>
<li>[importance 4] On most RDBMS, using genuine UUID V4 on large databases is not appreciated by DBAs because it <strong>fragments indexes</strong>, hence slowing them down when refreshing and during queries. If too fragmented, indexes have to be loaded entirely into memory, generating <a href="https://kccoder.com/mysql/uuid-vs-int-insert-performance/">important performance issues</a> if they don't fully fit into RAM and if disks have to be accessed.</li>
<li>[importance 2] Another <strong>performance issue deals with journals caused by fragmentation</strong>. Defragmentation (<code>REINDEX</code> or <code>VACUUM</code>) can become much slower and data replication (when enabled) can be impacted if relying on journals. On PostgreSQL, this phenomenon is called "WAL write amplification" by DBAs. Note, however, that the storage hardware has a large impact on performance when dealing with this issue. With SSD and NVMe disks, making random data access by design greatly mitigates this issue.</li>
</ul>
<p>These two last performance issues can easily be <strong>fixed using a UUID V4 variant: the <a href="https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/">UUID short prefix COMB</a></strong>. COMB means "combined" because it mixes UUID V4 randomness with a hint of time. Its principle is to "sacrifice" two bytes of randomness and use them as a rolling sequence based on the current time (epoch value) with a minute-wide precision. Every minute, the prefix is incremented (it will thus run through all values from <code>0000</code> to <code>FFFF</code> in about 45 days). A sample sequence of such UUID could be:</p>
<pre><code>2fe8-6aca-f113-4ef4-8b69-1b5de35d0832
2fe8-ec69-7acc-4cff-91c9-f658b331ee67
2fe9-8b94-993f-4176-9991-1f9e778a79a0 <- note the minute-wide increment
2fe9-b041-d0de-4552-b6b5-449a8ee32134
2fe9-da35-ce9d-4d4a-90e5-c2a4c89f18c7
2fe9-...
</code></pre>
<p>This way, UUID PKs induce far less fragmentation and index performances are similar to the ones observed with auto-incremented integers.</p>
<p>Several implementations exist. If you use Java, check this <a href="https://github.com/f4b6a3/uuid-creator#short-prefix-comb-non-standard">library</a> to generate PKs from your application code.</p>
<p>If you prefer letting the RDBMS create the UUID itself, several implementations exist (like this PostgreSQL <a href="https://github.com/tvondra/sequential-uuids">extension</a>) but this adds a bit of complexity to install and configure the RDBMS.</p>
<p>We observed a few minor drawbacks with this method though:</p>
<ul>
<li>It is a bit <strong>more difficult for developers to distinguish UUID</strong> as they start with the same bytes if created in a small amount of time. They have to check the last bytes.</li>
<li><strong>Loss of entropy slightly increases the chance of collisions</strong> as only 12 bytes over 14 are now truly shuffled. However, the two bytes rolling prefix still add a nonnegligible entropy that is based on the current time. If we estimate that we actually lose only a single byte of entropy, the collisions risk is still negligible. You now have a 50% chance to get a collision every 1.05E16 generated UUID. If you generate continuously 10 UUID per second, you have a 50% chance to get a collision every 33.5 million years.</li>
<li>If PKs have to be generated by an ETL (typically during a migration process), replacing the built-in standard UUID V4 generator with a short prefix COMBO <strong>may require a few lines of code and/or some integration work</strong>. For instance, for PENTAHO, we had to integrate a Java <a href="https://github.com/f4b6a3/uuid-creator#short-prefix-comb-non-standard">library</a> into the stream.</li>
</ul>
<h2>Other interesting resources on this topic</h2>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">https://en.wikipedia.org/wiki/Universally_unique_identifier</a></li>
<li><a href="https://medium.com/@Mareks_082/auto-increment-keys-vs-uuid-a74d81f7476a">https://medium.com/@Mareks_082/auto-increment-keys-vs-uuid-a74d81f7476a</a></li>
<li><a href="https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/">https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/</a></li>
<li><a href="https://www.informit.com/articles/printerfriendly/25862">https://www.informit.com/articles/printerfriendly/25862</a></li>
<li><a href="https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/">https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/</a></li>
</ul>
A first glimpse of production constraints for developers2021-05-24T00:00:00+00:00https://florat.net/a-first-glimpse-of-production-constraints-for-developers/<p>(This article has been also <a href="https://dzone.com/articles/first-glimpse-production-constraints-for-developers">published</a> at DZone)</p>
<p><img src="https://florat.net/assets/images/blog-tech/glimpse-production.jpg" alt="Sahara and sandbox"></p>
<p>In most organizations, developers are <strong>not allowed to access the production environment</strong> for stability, security, or regulatory reasons. This is a quite good practice (enforced by many frameworks like COBIT or ITIL) to restrict access to production but a major drawback is a <strong>mental distance created between developers and the real world</strong>. Likewise, the monitoring is usually only managed by operators and very little feedback is provided to developers except when they have to fix application bugs (ASAP, of course). As a matter of fact, most developers have <strong>very little idea of what a real production environment looks like</strong> and, more important, of the non-functional requirements allowing to <strong>write production-proof code</strong>.</p>
<p>Involving developers into resolving production issues is a good thing for two main reasons:</p>
<ul>
<li>It is <strong>highly motivating</strong> to get tangible evidences of a real running system on a large infrastructure (data centers, clusters, SAN...) and get insights about performances or business facts about their applications (number of transactions per seconds, number of concurrent users, and so on). It is also very common for developers to feel overwhelmed as they are rarely called directly when an outage occurs.</li>
<li>It may <strong>improve substantially the quality of the code delivered</strong> by helping to design properly operation aspects like logs, monitoring, performances, and integration.</li>
</ul>
<h2>So, What Do the Developers Misunderstand the Most Often about the Production?</h2>
<h3>Concurrency Is Omnipresent</h3>
<p>Production is <strong>highly concurrent</strong> while development is mostly single-threaded. Concurrency can happen among threads of the process (of an application server for instance) but also among different processes running locally or on others machines (e.g., among <em>n</em> instances of an application server running across different nodes). This concurrency can generate various issues like starvation (slow-downs when waiting concurrently for a shared resource), dead-locks or scope issues (data overriding).</p>
<p><strong>What can I do?</strong></p>
<ul>
<li>Perform <strong>minimal stress tests</strong> on DEV environment or even on your own machine using injectors like JMeter or Gatling... When using frameworks like Spring, make sure to understand and correctly apply scoping best practices (for instance don't use a Singleton with a state).</li>
<li><strong>Simulate concurrency</strong> using break-points or sleeps in your code and check the context of each staled thread.</li>
</ul>
<h3>Volume Is Huge: You Must Add Limits</h3>
<p>In production, <strong>everything is XXXL</strong> (number of logs line written, number of RPC calls, number of database queries...). This has major implications on performances but also on operability. For instance, writing a <code>Entering function x</code>/<code>Leaving function x</code> type of log could help in development but can flood Ops team with GiB of logs in production. Likewise, when dealing with monitoring, make sure to make alerts useable. If your application generates tens of alerts every day, nobody will notice them anymore in a few days' time frame.</p>
<p>Keep in mind this metaphor: <strong>If your DEV environment is a sandbox, production is the Sahara</strong></p>
<p>In production, real users or external systems will massively stress your application. If (for instance) you don't set attachment files maximum size, you will get soon network and storage issues (as well as CPU and memory as a collateral damage). Many limits can be set at the infrastructure level (like circuit breakers in API Gateways) but most of them have to be coded into the application itself.</p>
<p><strong>What can I do?</strong></p>
<ul>
<li>
<p><strong>Make sure nothing is 'open bar'</strong>: always paginate results from databases (using <code>OFFSET</code> and <code>LIMIT</code> for instance or <a href="https://use-the-index-luke.com/sql/partial-results/fetch-next-page">using the seek method</a>), restrict input data sizes, set timeouts on any remote call, ...</p>
</li>
<li>
<p><strong>Think carefully about logs</strong>. Perform operability acceptance tests with real operators and Site Reliability Engineers (SRE).</p>
</li>
</ul>
<h3>Production Is Distributed and Redundant</h3>
<p>While in DEV, most components (like an application server and a database) run inside the same node, they are usually <strong>distributed (i.e., some network link exists between them) in production</strong>. The network is very slow in comparison with local memory (at scale, if a local CPU instruction takes one second, a LAN network call takes a full year).</p>
<p>In DEV, the instantiation factor is 1: every component is instantiated only once. In any production environment having to deal with serious high availability, performance or fail-over requirements, every component is redundant. There are not only servers but <em>clusters</em>.</p>
<p><strong>What can I do ?</strong></p>
<ul>
<li>Don't hardcore URL or make <strong>assumptions about the colocalization</strong> of components (I already saw code where <code>localhost</code> hostname was hardcoded)</li>
<li>If possible, <strong>reduce dev/prod parity</strong> by using from your own workstation a locally distributed system like a local Kubernetes cluster (see K3S for instance).</li>
<li>Even if this kind of issue should be detected in integration testing environment, try to <strong>keep in mind that your code will eventually run concurrently</strong> on several threads and even nodes. This has implications on datasources number of connections tuning among others considerations.</li>
<li>Always favor <strong>stateless architectures</strong>.</li>
</ul>
<h3>Anything Can Happen in Production</h3>
<p>One of the most common sentences I heard from developers dealing with a production issue is "This is impossible, this can't happen". But it does actually. <strong>Due to the very nature of the production</strong> (high concurrency, unexpected behaviors of users, attacks, hardware failures...), <strong>very strange things can and will happen</strong>.</p>
<p>Even after serious postmortem studies, the <strong>root cause of a significant proportion of the production issues will never be diagnosed or solved</strong> (I would say from my own experience in about 10% of the cases). Some abeyant defects can occur only on a combination of exceptional events. Some bugs can happen once in 10 years or even by chance (or misfortune?) never occur during the entire application lifetime. Small story: I was faced very recently with a bug in a node.js job that occurred about once every 10K (when a randomly generated password contained an unescaped double dollar characters sequence).</p>
<p>Check out any production log and you will probably see erratic errors here or there (this is rather scary, trust me).</p>
<p><strong>Preventing expected issues is a good thing but truly good code should control and handle correctly the unexpected</strong></p>
<p>Hardware or network failures are very common. For instance, network micro-cuts can occur (see the <a href="https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing">8 Fallacies of Distributed Computing</a>): servers can crash and the filesystem can be filled.</p>
<p><strong>Don't trust data coming from others modules</strong>, even yours. As an example, an integration error can make a module to call a deprecated version of your API. You may also get corrupted files like with wrong encoding or wrong dates. <strong>Don't even trust your own database</strong> (add as many constraints as possible like <code>NOT NULL</code>, <code>CHECK</code>, ...): corrupted data can appear due to bugs in previous module versions, migrations, administration script issues, staled transactions, integration error on encoding or timezones... Let run any application over several years and perform some data sanity checks against your own database: you may be surprised.</p>
<p><strong>Users and external batch systems should be treated as <a href="https://en.wikipedia.org/wiki/Infinite_monkey_theorem">monkeys</a></strong> (with all due respect).</p>
<p>Don't rely on human processes but <strong>assume they can do <em>anything</em></strong>. For instance, two common <a href="https://www.urbandictionary.com/define.php?term=PEBCAK">PEBCAK</a> problems occurring on front parts I observed recently:</p>
<ul>
<li>
<p>Double submit (some users double-clicking instead of single clicking). Some REST RPC calls are hence done twice and concurrency oddities occur in the backend;</p>
</li>
<li>
<p>Private navigation: for some reasons, users switch to this mode and strange things happen (like local data lost or browser extensions disabled).</p>
</li>
</ul>
<p>Most of the time, users will never admit or figure out this kind errors. They can also use a wrong browser, use a personal machine instead of a pro, open the webapp twice in several tabs and many others things you would never imagine.</p>
<p><strong>What can I do ?</strong></p>
<ul>
<li>Make your <strong>code as robust as possible</strong>, write anti-corruption layers and <a href="https://florat.net/proper-strings-normalization-for-comparison-purpose/">normalize strings</a>. When parsing data, control time formats, encoding, formats (if using <a href="https://en.wikipedia.org/wiki/Hexagonal_architecture_(software)">hexagonal architecture</a>, perform these controls as soon as possible in the 'in' adapters).</li>
<li>Add <strong>as many constraints checks in your database</strong> as possible. Don’t just rely on the domain layer code.</li>
<li>When possible, instead of writing your own controls, <strong>rely on a shared contract</strong> (like an JSON or XSD Schema).</li>
<li>Think about <strong>retries, robust error handling, double submission, replays</strong> from save points in batch jobs, ...</li>
<li>When writing your tests, think about as many <strong>border-line or apparently impossible cases</strong> as possible.</li>
<li>Use <strong>chaos-engineering</strong> tools (like Simian Army) that generate errors randomly to test your code resiliency.</li>
<li>Think about what to do with <strong>rejected data</strong>.</li>
<li>To <strong>deal with human errors</strong>, identify problematic users, book a meeting and observe them using your application before asking any direct question to avoid directing them.</li>
<li>Build a <strong>realistic testing dataset</strong> and maintain it. Add new data as soon as you're aware of a special case you didn't considered before. Manage these datasets like your code (versioning, cleanup, refactoring, documentation...).</li>
<li><strong>Don't ignore weak signals</strong>. When something strange happens in development, it will probably happen in production as well and will be far worse then.</li>
<li>When fixing an issue, make sure to <strong>identify all the places where it can occur</strong> and don't only fix it in the place you localized it.</li>
<li><strong>Add clever logs</strong> in your code. A clever log comes with:
<ul>
<li>A canonic identifier in the message (an errors code like <code>ERR123</code> or an event ID like <code>NEW_CLIENT</code>). This greatly eases monitoring by enabling regexp matching;</li>
<li>All required debugging context (like a person UUID, the instant of the log...);</li>
<li>The right verbosity level;</li>
<li>Stack traces when dealing with errors so developers can easily localize the problem in their code</li>
</ul>
</li>
</ul>
<h3>Issues Never Walk Alone</h3>
<p>In production, things never ever get better on their own: <strong>hope is not a strategy</strong>. Due to the Murphy’s law, anything with the ability to fail will fail.</p>
<p>Worse: <strong>issues often occur simultaneously</strong>. An initial incident can induce another one, even if they look unrelated at a first glance (for instance, an Out Of Memory can create a pressure on the JVM Garbage Collector which in turn increases CPU usage, which induces queued work latency and finally generates timeouts from clients.</p>
<p>Sometimes, this is even worse: truly <strong>unrelated issues may occur simultaneity</strong> by misfortune making the diagnostic much more difficult by leading down on a wrong way when performing the post-mortem.</p>
<p><strong>What can I do?</strong></p>
<ul>
<li><strong>Don’t leave issues in logs in production or in DEV unresolved</strong>. Most issues may be detected in development or acceptance environments. Often, we observe problems and we ignore them, thinking this is transient, due to some integration issue or intermittent network issue. This kind of issue should instead be taken as a chance to reveal a real issue and should not be ignored.</li>
<li>When you observe something strange, stop immediately and take a few minutes to analyze the issue or to add new tests cases. <strong>Think you may have found an abeyant defect that would take days to diagnostic and resolve later in production</strong>.</li>
</ul>
<h3>In Production, Everything Is Complicated, and Time-consuming</h3>
<p>For some good but also <a href="https://en.wikipedia.org/wiki/Cargo_cult_programming">not so good</a> reasons, <strong>every change should be controlled and traced</strong> in a regulated IS. Perform a single SQL statement must be tested in several testing or pre-production environment and finally applied by a DBA.</p>
<p>Any simple Unix command has to be <strong>documented in a procedure</strong> and executed by the Ops team who is the sole one to access the servers.
Most of these <strong>operations must be planned, documented in depth, motivated, traced</strong> into one or several ticket systems. Changing a simple file or a single row in a database can hardly take less than half a man-day when counting all involved persons.</p>
<p><strong>The costs increase exponentially when we are getting closer to production</strong>. See [Capers Jones, 1996] or [Marco M. Morana, 2006] : a bug can cost as low as $25 to fix in DEV and as high as $16K in running production.</p>
<p>Even if modern software engineering promotes CD (Continuous Deployment) and the use of IaC (Infrastructure As Code) tools like Kubernetes, Terraform, or Ansible, <strong>deploying in production is still a significant event in most organizations</strong> and most DevOps concepts are still theoretical. Deploying a release can't be done every day but about once a week or even a month. Any release usually has to be validated by the product owner's acceptance tests (a lot of manual and repetitive operations). Any blocking issue would require a hotfix coming with a lot of administrative and building work.</p>
<p><strong>What can I do?</strong></p>
<ul>
<li>Perform <strong>as much unit, integration and system testing as possible</strong> before the production environment.</li>
<li>Add <strong>hot-reloading configuration</strong> capacities to your modules (like changing log verbosity using a simple REST call against an administrative endpoint).</li>
<li>Make sure that all <strong>process with operations</strong> (ticket system, people to contact, way to alert...) are <strong>documented and quickly accessible</strong>. If not, document them to gain a lot of time the next time.</li>
</ul>
<h3>Production Is Very Stressful</h3>
<p>When an incident occurs in production, your stress level may depend on the kind of industry you're working for but even if you work for a medium-sized e-commerce company and not a nuclear facility or a hospital, I can guarantee that <strong>any problem generates a lot of pressure</strong> coming from customers, management, others teams depending on you. Ops teams are used to it and most are impressively calm when dealing with this kind of event. It's part of their job after all. When the problem comes from your code, you may have to work with them and <strong>take on yourself a part of the pressure</strong>.</p>
<p><strong>What can I do?</strong></p>
<ul>
<li>Make sure to be prepared <strong>before the incident</strong> by writing or learning procedures (read for instance the great SRE Book by Google <a href="https://sre.google/sre-book/managing-incidents/">chapter 14</a>).</li>
<li><strong>Be confident in your logs and monitoring metrics</strong> to help you to find the root cause (for instance, prepare in advance insightful dashboards and centralized logs queries).</li>
<li>For any complex issue, <strong>begin the investigation by creating a post-mortem document</strong> centralizing any note, stack trace, log or graph illustrating your hypothesis.</li>
</ul>
<h3>In Production, You Don't Have a Single Version to Manage</h3>
<p>In practical scenarios, it's not usually feasible to compel all your internal or external clients to simultaneously upgrade to your latest API or data model. You must handle intricate paths that require supporting several versions concurrently. For example, in the extensive French Tax Information System, the most crucial central API (such as the Person API) provides three versions of each endpoint. Approximately every year, a new version is introduced, the second is deprecated and becomes the third, and the third is decommissioned. All these three versions must coexist with a shared data model.</p>
<p><strong>What can I do?</strong>
Always include a version in your API URLs (for instance, <code>/v1/foo/bar</code>). Incorporate model versions in your data model. For example, in NoSql, add a <code>modelVersion</code> attribute that will enable your code or ETL tools to determine the version of each data individually. Consider managing data of varying versions. For instance, write conditional code based on the data model version.</p>
<h3>In Production, You Usually Don't Start from Scratch</h3>
<p>In development, when your database structure (DDL) evolves, you simply drop and recreate it. In production, in most of cases, <strong>data is already there and you have to perform migrations or adaptations</strong> (using ETL or others tools).
Likewise, if some clients already use your API, you can't simply change the signature without asking questions but you have to think about backward compatibility. If you have to, you can depreciate some code but then, you have to plan the end of service.</p>
<p><strong>What can I do?</strong></p>
<ul>
<li>In development, don't just drop DDL but 'code' changes using incremental changes tools like Liquibase. The same tools should be used in production.</li>
<li>Check that your libraries or API are still backward compatible using integration tests.</li>
<li>Use <a href="https://semver.org/">Semantic Versioning</a> conventions to alert for breaking changes.</li>
</ul>
<h3>Security Is More Pregnant in Production</h3>
<p>In any seriously protected production environment, <strong>many security systems are set up</strong>. They are often absent from the others environments due to their added complexity and costs. For instance, you can find additional level 3 and 4 firewalls, WAF (Web Application Firewalls operating at level 7 against HTTP(S) calls), API Gateways, IAM systems (SAML, OIDC...), HTTP(S) proxies or reverse proxies. Internet calls are usually forbidden from servers which that can only use replicated and cached data (like local packages repositories). Hence, many security infrastructure differences can mask issues that will be only discovered in pre-production or even production.</p>
<p><strong>What can I do?</strong></p>
<ul>
<li><strong>Don't use the same values for different credential parameters</strong>. This can hide some integration issues in production where parameters have more chances to be different and where different passwords are used for any resource.</li>
<li>Make sure to <strong>understand the security infrastructure</strong> limitations before coding related user stories.</li>
<li>Test security infrastructure <strong>using containers</strong>.</li>
</ul>
<h3>Conclusion</h3>
<p>It's a good thing for developers to <strong>be curious and get information about the production by themselves</strong> by reading blogs, books or simply asking to colleagues. As a developer, do you know how many cores a medium-range server owns (by socket)? How much RAM by blade? Did you ask yourself where the data centers running your code are located? How much power consumption your modules use in KWH every day? How data is stored in SAN? Are you familiar with fail-over systems like load balancers, RAID, standby-databases, virtual infrastructure management, SAN replications...? You don't have to be an expert but <strong>it's important and gratifying to know the basics</strong>.</p>
<p>I hope I provided to developers a first glimpse of the production constraints. <strong>Production is a world where everything is multiplied</strong>: the gravity of issues, costs, time to fix systems. Always keep in mind that <strong>your code will eventually run in production and a working code is far from being enough</strong>: your code must be production-proof to make the organization IS run smoothly. Then, <strong>everything will be fine and everybody will be at home early</strong> instead of pulling out hair until late the night...</p>
Release of the first version of our Project architecture document template2021-04-16T00:00:00+00:00https://florat.net/release-of-the-first-version-of-our-project-architecture-document-template/<p>Four years after the first release of a template in French, we release a revisited English version.</p>
<p>This architecture template is applicable to most management IT projects, regardless of the general architecture chosen (monolithic, SOA, micro-service, n-tier, …). It has already been used on several important projects including large organizations. It is maintained on a regular basis.</p>
<p>Discover it at <a href="https://github.com/bflorat/architecture-document-template">GitHub</a>.</p>
Proper strings normalization for comparison purpose2020-12-22T00:00:00+00:00https://florat.net/proper-strings-normalization-for-comparison-purpose/<p>(This article has also been <a href="https://dzone.com/articles/proper-strings-normalization-for-comparison-purpos">published</a> at DZone)</p>
<p><img src="https://florat.net/assets/images/blog-tech/normalization.png" alt="Illuminated initials, sixteenth-century"></p>
<h3>TL;DR</h3>
<p>In Java, do:</p>
<pre><code>String normalizedString = Normalizer.normalize(originalString,Normalizer.Form.NFKD)
.replaceAll("[^\\p{ASCII}]", "").toLowerCase().replaceAll("\\s{2,}", " ").trim();
</code></pre>
<p>Nowadays, most strings are Unicode-encoded and we are able to work with many different native characters with diacritical signs/accents (like <code>ö</code>, <code>é</code>, <code>À</code>) or ligatures (like <code>æ</code> or <code>ʥ</code>). Characters can be stored in UTF-8 (for instance) and associated glyphs can be displayed properly if the font supports them. This is good news for cultural specificities respect.</p>
<p>However, we often observe recurring difficulties when comparing strings issued from different information systems and/or initially typed by humans.</p>
<p>Human brain is a <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1593217/">machine to fill gaps</a>. Hence it has absolutely no problem to read or type <code>'e'</code> instead of <code>'ê'</code>.</p>
<p>But what if the word <code>'tête'</code> (<code>'head'</code> in French) is correctly stored in an UTF-8 encoded database but you have to compare it with an end-user typed text missing accents?</p>
<p>We also have often to deal with legacy systems or modern ones filled up with legacy data that doesn't support the Unicode standard.</p>
<p>Another simple illustration of this problem is the use of ligatures. Imagine a product database storing various items with an ID and a description. Some items contain <a href="https://en.wikipedia.org/wiki/Orthographic_ligature">ligatures</a> (a combination of several letters joined together to create a single character like <code>’Œuf’</code> - egg in French). Like most French people, I have no idea of how to produce such a character, even using a French keyboard. I would spontaneously search the items descriptions using <code>oeuf</code>. Obviously, our code has to take care of ligatures if we want to return a useful result containing <code>’Œuf’</code>.</p>
<p>How to fix that mess?</p>
<h2>Rule #1: Don't even compare human text if you can</h2>
<p>When you can, never compare strings from heterogeneous systems. It is surprisingly tricky to do it properly (even if it is possible to handle most cases like we will see below). Instead, compare sequences, <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> or any other ASCII-based strings without spaces or ‘special’ characters. Strings coming from different information systems have a good probability to store data differently (lower/upper case, with/without diacritics, etc.). On the contrary, good ids are free from encoding issues being plain ASCII strings.</p>
<p>Example:</p>
<p>System 1 : <code>{"id":"8b286f72-b366-47a4-9537-59d39411979a","desc":"Œeuf brouillé"}</code></p>
<p>System 2 : <code>{"id":"8b286f72-b366-47a4-9537-59d39411979a","desc":"OEUF BROUILLE"}</code></p>
<p>If you compare ids, everything is simple and you can go home early. If you compare description, you'll have to normalize it as a prerequisite or you'll be in big trouble.</p>
<p><strong>Characters normalization is the action of computing a canonical form of a string. The basic idea to avoid false positives when comparing strings coming from several information systems is to normalize both strings and to compare the result of their normalization.</strong></p>
<p>In the previous example, we would compare <code>normalize("Œeuf brouillé")</code> with <code>normalize("OEUF BROUILLE")</code>. Using a proper normalization function, we should then compare <code>'oeuf brouille'</code> with <code>'oeuf brouille'</code> but if the normalization function is buggy or partial, strings would mismatch. For example, if the <code>normalize()</code> function doesn't handle ligatures properly, you would get a false positive by comparing <code>'œuf brouille'</code> with <code>'oeuf brouille'</code>.</p>
<h2>Rule #2: Normalize in memory</h2>
<p>It is better to compare strings at the last possible moment and to do so in memory and not to normalize strings at storage time. This at least for two reasons:</p>
<ol>
<li>
<p>If you only store a normalized version of your string, you lose information. You may need proper diacritics later for displaying purpose or others reasons. As an IT professional, one of your tasks is to never lose information humans provided you.</p>
</li>
<li>
<p>What if some items have been stored before the normalization routine has been set up? What if the normalization function changed over time?</p>
</li>
</ol>
<p>To avoid these common pitfalls, simply compare in memory <code>normalize(<data system 1>)</code> with <code>normalize(<data system 2>)</code>. The CPU overhead should be negligible if you don't compare thousands of items per second...</p>
<h2>Rule #3: Always trim externally and internally</h2>
<p>Another common trap when dealing with strings typed by humans is the presence of spaces at the beginning or in the middle of a sequence of characters.</p>
<p>As an example, look at these strings: <code>' Wiliam'</code> (note the space at the beginning), <code>'Henry '</code> (note the space at the end), <code>'Gates III'</code> (see the double space in the middle of this family name, did you notice it at first?).</p>
<p>Appropriate solution:</p>
<ol>
<li>Trim the text to remove spaces at the beginning and at the end of the text.</li>
<li>Remove surnumerous spaces in the middle of the string.</li>
</ol>
<p>In Java, one of the way to achieve it is:</p>
<pre><code>s = s.replaceAll("\\s{2,}", " ").trim();
</code></pre>
<h2>Rule #4: Harmonize letters casing</h2>
<p>This is the most known and straightforward normalization method: simply put every letters to lower or upper case. AFAIK, there is no preference for one or the other choice. Most of developers (me included) use lower case.</p>
<p>In Java, just use <code>toLowerCase()</code>:</p>
<pre><code>s = s.toLowerCase();
</code></pre>
<h2>Rule #5: Transform characters with diacritical signs to ASCII</h2>
<p>When typed, diacritical signs are often omitted in favor of their ASCII version. For example, one can type the German word <code>'schon'</code> instead of <code>'schön'</code>.</p>
<p>Unicode proposes four <a href="http://www.unicode.org/reports/tr15/#Canon_Compat_Equivalence">Normalization forms</a> that may be used for that purpose (NFC, NFD, NFKD and NFKC). Check-out <a href="https://www.unicode.org/reports/tr15/images/UAX15-NormFig6.jpg">this enlightening illustration</a>.</p>
<p>Detailing all these forms would go beyond the scope of this article but basically, keep in mind that some Unicode characters can be encoded either as a single combined character or as a decomposed form. For instance, <code>'é'</code> can be encoded as <code>\u00e9</code> code point or as the decomposed form <code>'\u0065'</code> (<code>'e'</code> letter) + <code>'\u0301'</code> (the diacritic <code>'◌́''</code>) afterward.</p>
<p>We will perform a NFD ("Canonical Decomposition") normalization method on the initial text to make sure that every character with accent is converted to its decomposed form. Then, all we have to do is to drop the diacritics and only keep the 'base' simple characters.</p>
<p>In Java, both operations can be done this way:</p>
<pre><code>s = Normalizer.normalize(s, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
</code></pre>
<p>Note: even if this code covers this current issue, prefer the <code>NFKD</code> transformation to deal with ligatures as well (see below).</p>
<h2>Rule #6: Decompose ligatures to a set of ASCII characters</h2>
<p>The other thing to understand is that Unicode maintain some compatibility mapping between about 5000 ‘composite’ characters (like ligatures or roman precomposed roman numeral) and a list of regular characters. Characters supporting this feature are documented (check the '<a href="https://www.compart.com/en/unicode/U+0133">decomposition</a>' attribute in Unicode characters documentation).</p>
<p>For instance; the roman numeral Ⅻ (U+216B) can be decomposed with NFKD normalization as a <code>'X'</code> and two <code>'I</code>'. Likewise, the <code>ij</code> (U+0133) character (like in <code>'fijn'</code> - 'nice' in Dutch) can be decomposed into a '<code>i</code>' and a '<code>j</code>'.</p>
<p>For these kinds of 'Siamese twins' characters, we have to apply the NFKD ("Compatibility Decomposition") normalization form that both decompose the characters (see 'Rule #5' previously) but also maps ligatures to several 'base' characters. You can then drop the remaining diacritics.</p>
<p>In Java, use:</p>
<pre><code>s = Normalizer.normalize(s, Normalizer.Form.NFKD)
.replaceAll("[^\\p{ASCII}]", "");
</code></pre>
<p>Now the bad news : for obscure reasons, Unicode doesn't support decomposition equivalence of some widely used ligatures like French '<code>œ</code>' and '<code>æ</code>' or the German eszett '<code>ß</code>'. If you need to handle them, you will have to write your own replacements <strong>before</strong> applying the NFKD normalization :</p>
<pre><code> s = s.replaceAll("œ", "oe");
s = s.replaceAll("æ", "ae");
s = Normalizer.normalize(s, Normalizer.Form.NFKD)
.replaceAll("[^\\p{ASCII}]", "");
</code></pre>
<h2>Rule #7: Beware punctuation</h2>
<p>This a more minor issue but according to your context you may want to normalize some special punctuation characters as well.</p>
<p>For example, in a literary context like a text-revision software, it may be a good idea to map the em/long dash (<code>'—'</code>) character to the regular ASCII hyphen (<code>'-'</code>).</p>
<p>AFAIK, Unicode doesn't provide mapping for that, just do it yourself the old good way:</p>
<pre><code>s = s.replaceAll("—", "-");
</code></pre>
<h2>Final word</h2>
<p>String normalization is very helpful to compare strings issued from different systems or to perform appropriate comparisons. Even fully English localized projects can benefit from it, for instance to take care of case or trailing spaces or when dealing with foreign words with accents.</p>
<p>This article exposes some of the most important points to take into consideration but it is far from exhaustive. For instance, we omitted Asian characters manipulation or cultural normalization of semantically equivalents items (like <code>'St'</code> abbreviation of <code>'Saint'</code>) but I hope it is a good start for most projects.</p>
<h2>References</h2>
<p>http://www.unicode.org/reports/tr15/</p>
<p>https://docs.oracle.com/javase/tutorial/i18n/text/normalizerapi.html</p>
<p>https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization</p>
<p>https://minaret.info/test/normalize.msp</p>
Why did I rewrite my blog using Eleventy ?2020-11-05T00:00:00+00:00https://florat.net/why-did-i-rewrite-my-blog-using-eleventy/<h2>Reasons to change</h2>
<p>This personal home page and blog was previously self-hosted using a great Open Source Wiki engine: Dokuwiki. It worked great for long years but few months ago, I felt than it was time to change lanes and embrace the <a href="https://jamstack.org/">JAM Stack</a> (JavaScript / API & Markdown).</p>
<h3>Issues with traditional wikis</h3>
<ul>
<li>Security: many spam in comments, possible PHP vulnerabilities</li>
<li>Regular upgrades to be performed against the engine</li>
<li>Many plugins required to make something useful. Old ones, conflicting ones...</li>
<li>Not so easy to customize the rendered pages</li>
<li>Slower than a static website</li>
<li>Much larger electricity consumption to serve pages</li>
<li>Requires PHP modules to be installed and tunned along with the HTTP server</li>
<li>Most wiki engines require a database (even if it is not the case of Dokuwiki)</li>
<li>Not so easy reversibility. One way way is to use Pandoc to translate wiki syntax to markdown.</li>
</ul>
<h3>Opportunities with the JAM Stack</h3>
<ul>
<li>Ability to write articles using a more widespread markdown languages than one of the numerous Wiki syntaxes around</li>
<li>None vulnerability possible (except from the Web server itself) as the produced website is only static HTML</li>
<li>Using Git (advanced version control) and associated ecosystem (Merge Requests...)</li>
<li>Possibility to use CI/CD tools to deploy new pages</li>
<li>Can be deployed on CDN (even if I continue to self-host it)</li>
<li>Possibility to use great IDE to write articles (like VSCode and all its extensions)</li>
<li>Faster preview of rendered page : I can now see in my browser the result in less than a single second</li>
<li>Containers-friendly (using a nginx docker image typically)</li>
<li>It's the new trend ! (OK, it's a kind of <a href="http://radar.oreilly.com/2014/10/resume-driven-development.html">RDD</a> but it may be useful in current professional context)</li>
</ul>
<h3>The not-so-good using the JAM Stack</h3>
<ul>
<li>You have to rely on <a href="https://myclientwants.com/#ugc">external services</a> to perform some basic features like adding comments (already disabled in my case, too many spam messages) or full-text searches</li>
</ul>
<h2>Eleventy</h2>
<p>Well, I finally decided to switch to the JAM Stack. But it is <a href="https://jamstack.org/generators/">very crowded</a>.
I already use <a href="https://jamstack.org/generators/antora/">Antora</a> at work to generate great technical documentation using Asciidoc but it was not suitable for a blog. I also used <a href="https://jamstack.org/generators/jekyll/">Jekill</a> for a long time with Github pages (see <a href="http://jajuk.info/">Jajuk website</a>) but I find it complicated, aging and too restrictive.</p>
<p>After a quick look at the most popular platform (<a href="https://jamstack.org/generators/hugo/">Hugo</a>), I gave up. Basically, I felt than I had to learn a full world before being able to make a website and I haven't this time.</p>
<p>Then, I heart about a new simple platform: <a href="https://www.11ty.dev/">Eleventy</a>. I loved the Unix-like idea behind it: a very low level tool leveraging on existing templating engines like Liquid or Nunjucks and allowing to mix HTML and markdown contents. It also leverages a convention over configuration principle enabling results in no time.</p>
<p>Last but not least: it is very fast (near as fast as Hugo). It is a JavaScript tool great for most frontend developers who can use npm, sass... Look at <a href="https://jamstack.org/generators/hugo/">this page</a> if you want to see sample code using Eleventy.</p>
<p>I finally rewrote my website in raw CSS, HTML, Markdown and Liquid templates thanks to Eleventy. It only toke me a single day to grasp basic Eleventy concepts and port the existing website. I finally got a full control over my pages.</p>
<p>Note that another <a href="https://www.youtube.com/watch?v=h6ZxRudaYIQ">common strategy</a> is to use an existing theme (like a Bootstrap-based theme) and to make the HTML generic using templating templates. I gave up this method because I wanted something simple, very light and something I fully control and understand...</p>
Comment faire de bons ADR (décisions d'architecture) ?2020-05-06T00:00:00+00:00https://florat.net/comment-faire-de-bons-adr/<h2>Registre des décisions d'architecture</h2>
<p>Un registre de décisions d'architecture sert à consigner les décisions
importantes d'architecture (les ADR, <em>Architecture Decision Record</em>).</p>
<p>Le but est de permettre la connaissance et la compréhension des choix <em>a
posteriori</em> et de partager les décisions. Le dossier d'architecture
quant à lui ne reprend pas ces choix mais ne fait apparaître que la
décision finale.</p>
<p>Il n'y a qu'un seul registre d'ADR par projet.</p>
<h3>Format d'un ADR</h3>
<p>Chaque ADR est constitué d'un fichier unique au format asciidoc avec ce
nom : <code>[séquence XYZ démarrant à 001]-[decision].adoc</code>.</p>
<p>Format de la décision : en minuscule sans espaces avec des tirets comme
séparateur. Exemple : <code>007-API-devant-bases-existantes-perennes.adoc</code>.</p>
<p>Chaque ADR contient idéalement le contenu suivant (adaptable en fonction
des besoins) :</p>
<h4>1) Historique</h4>
<ul>
<li>Donner le statut et l'historique des changements d'états</li>
<li>Les statuts possibles sont : <code>TODO</code> (à rédiger), <code>WIP</code> (Work In
Progress), <code>PROPOSE</code>,<code>REJETE</code>, <code>VALIDE</code>, <code>DEPRECIE</code>, <code>REMPLACE</code>.</li>
<li>Si le statut est <code>VALIDE</code>, détailler la date et les décideurs qui
ont validé.</li>
<li>Si le statut est <code>REMPLACE</code>, donner la référence de l'ADR à prendre
en compte.</li>
<li>Ne jamais supprimer un ADR (le mettre en statut <code>DEPRECIE</code>) et ne
pas réutiliser l'ID d'un autre ADR du même module.</li>
<li>Mentionner l'éventuel ADR qui le remplace. Exemple:
<code>Remplacé par l'ADR 002-...</code></li>
</ul>
<h4>2) Contexte</h4>
<p>Présente les choix possibles, les problématiques, les forces en jeu
(techniques, organisationnelles, réglementaires, financières, humaines
...). Donner les forces, faiblesses, opportunités et risques de chaque
solution (voir <a href="https://fr.wikipedia.org/wiki/SWOT_(m%C3%A9thode_d%27analyse)">méthode
SWOT</a>).</p>
<p>Note:</p>
<ul>
<li>Si un point est rédhibitoire, l'indiquer.</li>
<li>Numéroter les solution pour y référer sans ambiguïté</li>
<li>Pour les cas les plus simples, deux paragraphes
avantages/inconvénients pour chaque solution peuvent suffire.</li>
<li>Dans certains cas, l'ADR ne peut contenir qu'une seule solution, le
but étant de documenter les raisons de cette architecture.</li>
</ul>
<h4>3) Décision</h4>
<p>Donner la décision retenue (être affirmatif et rappeler le numéro de
solution retenue). Exemple:
<code>Nous effectuerons les signatures de PDF au fil de l'eau (solution 1)</code>.</p>
<h4>4) Conséquences</h4>
<p>Donner les éventuelles conséquences de la décision en terme de mise en
œuvre. Ne pas reprendre les forces, faiblesses des solutions mais plutôt
les conséquences pratiques de la décision. Donner les actions permettant
de réduire les éventuels risques induis par la solution.</p>
<p>Exemples :</p>
<p>* <code>Il conviendra de prévoir des logs spécifiques pour le traitement</code></p>
<p>*
<code>Le risque d'indisponibilité sera couvert par des astreintes renforcées</code></p>
<h3>Format du registre</h3>
<p>Idéalement, un registre d'ADR propose un rendu visuel de tous les ADR
avec leur statut et leur historique respectifs de façon à disposer
d'une vue globale sur la situation de chaque décision. Statuts et
historiques ne doivent en aucun cas être dupliqués car implique une
double maintenance qui a très peu de chance d'être faite correctement.
Dans la plupart des cas, mieux vaut ne faire figurer ces informations
que dans chaque ADR même si cela implique de les ouvrir un pas un. Une
alternative est de classer les ADR dans des sous-répertoires par statut
mais cela rend le parcours des ADR plus difficiles.</p>
<p>Si vous utilisez Asciidoc (ce que je recommande vivement), une astuce
existe : l'inclusion de tags. L'idée est de laisser le statut et
l'historique mais chaque ADR mais de les inclure dans un tableau pour
former le registre. Exemple :</p>
<p>Dans <code>001-dedoublonnage-requetes.adoc</code> :</p>
<pre><code>## Statut
// tag::statut[]
`VALIDE`
// end::statut[]
## Historique
// tag::historique[]
Validé le 26 nov 2019 avec xyz
// end::historique[]
</code></pre>
<p>et dans le registre (<code>README.adoc</code>) :</p>
<pre><code>.Table Liste et statuts des ADR RECE
[cols="2,1a,4a"]
|===
|ADR |Statut |Historique
|link:001-dedoublonnage-requetes.adoc[001-dedoublonnage-requetes]
|include::001-dedoublonnage-requetes.adoc[tags=statut]
|include::001-dedoublonnage-requetes.adoc[tags=historique]
|link:002-appels-synchrones.adoc[002-appels-synchrones]
|include::002-appels-synchrones.adoc[tags=statut]
|include::002-appels-synchrones.adoc[tags=historique]
...
|===
</code></pre>
<h3>Exemple complet d'ADR</h3>
<pre><code> ## Historique
Statut: `VALIDE`
* Validé par xyz le 28 janvier
* Proposé par z le 02/01/2020
## Contexte
<Présentation générale de la problématique>
# Solution 1: <description solution>
## Forces
- Limite l'utilisation du réseau
## Faiblesses
- Moins robustesse
## Opportunités
## Risques
- [rédhibitoire] Nécessite que la signature se fasse en synchrone ou en fil l'eau
# Solution 2: <description solution>
## Forces
## Faiblesses
## Opportunités
## Risques
## Décisions
La solution 2 est retenue
## Conséquences
- Vérifier la configuration des JVM pour utiliser un générateur d'aléas
</code></pre>
<h3>Conseils d'utilisation</h3>
<ul>
<li>Ne pas hésiter à ajouter des images/schémas... Penser à Mermaid et
Plantuml.</li>
<li>Ne pas horodater les modifications de l'ADR lui-même, c'est le rôle
de l'outil de gestion de version (GIT). Utiliser des messages de
commit explicites.</li>
<li>Un bon ADR doit être :
<ul>
<li>court ;</li>
<li>clair ;</li>
<li>pertinent (explique bien le contexte, les choix possibles et la
décision retenue) ;</li>
<li>accessible de tous (Wiki, Github..., pas de documents
bureautique) ;</li>
<li>tracé (changelog, commits Git, ...) ;</li>
<li>transparent : s'il manque des éléments de décision, les
mentionner.</li>
</ul>
</li>
</ul>
<h2>Autres resources</h2>
<p>Liens : <a href="https://github.com/joelparkerhenderson/architecture_decision_record">liste des templates d'ADR courants</a></p>
V3 modèle de dossier d'architecture2019-09-01T00:00:00+00:00https://florat.net/v3-modele-de-dossier-d'architecture/<p>Voir <a href="https://github.com/bflorat/modele-da">https://github.com/bflorat/modele-da</a></p>
<p>Le modèle a été augmenté, simplifié et corrigé. Surtout, il prend la
voie d'une documentation vivante en étant repris en asciidoc (il sera
donc maintenant possible de proposer des merge requests par exemple).
Les diagrammes sont toujours en Plantuml mais la plupart ont été repris
en diagrammes C4.</p>
<p>Retours et PR appréciés</p>
Summary of Cal Newport's "Deep Work" book2018-05-31T00:00:00+00:00https://florat.net/summary-of-cal-newport's-%22deep-work%22-book/<p>I just finished <a href="https://www.amazon.com/Deep-Work-Focused-Success-Distracted/dp/1455586692">"Deep
work"</a>,
an interesting book. I only regret it doesn't contain any reference
concerning the <a href="http://pomodorotechnique.com/">pomodoro technique</a>.</p>
<p>Here's my few raw notes :</p>
<pre><code>Deep work : “professional activities performed in a state of distraction-free concentration that push cognitive capabilities to their limit”. For high skills, difficult to replicate.
Shallow work : “non cognitive demanding, logistic-style tasks, often performed while distracted.” Low value, easily replicable
Deep work hypothesis : the ability to perform a deep work is rare and valuable. Those who are capable will thrive.
The core abilities :
- quickly master hard things
- produce elite level with speed
Both depends on deep work
Myelin : by triggering always the same paths, better signal -> more focus = more intelligence
High quality work = time x intensity of focus
Metric black hole : we don't actually measure value of tasks we perform
Principe of least resistance : given that we don't actually measure value of our work, we do first what is easier : shallow work.
Busyness as a proxy for productivity : in knowledge works, difficult to estimate our own value : a lot of shallow work makes false feeling of produced value
Cult of the Internet : everything from the Internet (like facebook) is considered a piori as good in IT : hugh error.
Neuroscience : what you are is the sum of what you focus on. Happier when we focus on flow activities. We need goals, challenges, feedback.
We all have a limited amount of will-power so we need to save it for deep work.
Profiles of deep workers:
- bimodal : monastic-like activities for few days, shallow work during the rest of the time
- rhythmic philosophy : moment reserved every day, use a chain method like a cross on the calendar : we want to avoid any hole in the chain.
- journalist philosophy : switches between shallow work and deep work all the day long (hard)
Ideas to help deep work:
- grand gesture : leave habits, work in an hotel for ie
- help serendipity by meeting people from others disciplines
- stop to work the evening to let the unconscious mind to solve problems for you (less work = more CPU to solve problems in your mind background)
- also rest because we all have a limited amount of available attention
- perform of shutdown ritual every end of day (like saying 'work performed') -> brain conditioned to stop running thoughts. Otherwise, Zeigarnik effect (we remember better interrupted tasks because we want to solve it)
- search boredom to help the brain to rewire
- schedule the day by blocks, change blocks during the day if required
Deep work meditation to solve complex problems:
- Store variables of current state of the problem
- ask question to force the brain to go to the next problem and no looping
- fight distracted thoughts
Memorization technique (see the book for more details) : imagine large objects in 5 rooms of our house, map the objects with a set of celebrities and imagine scenes. Each person maps a value (like a number of a card value)
Avoid any-benefits tools like facebook, concentrate on craftsman approach : only consider tools that help significantly to reach the lead goals
To determinate if a tool that help :
- list the key activities you need to realize to reach the lead goals
- for each activities, ask yourself if the tool helps or not
4DX (Four disciplines of eXecution) :
- focus on widely import goals (measurable few goals)
- focus on lead goals, not long term goals
- use scoreboards
- perform periodic summaries
Law of the vital fews (Pareto principle) : 80% of a given effect is done by 20% of the possible causes
During leisure, avoid using Internet, do high-level activities like reading literature
Evaluate shallow work performed by week and confront it to your boss and ask him to validate.
To determine if a work is shallow : how many months would it take to teach an hypothetical post graduate to make it ?
Say "no" by default, provide vague explanation to avoid questions.
Process centric e-mails to close the loop and free the mind : state clearly the next steps on every subject (every action)
Avoid replying to e-mails on subject without interest, coming with too much work to reply etc..
</code></pre>
Benefits of Hardware-based Full Disk Encryption and sedutil2018-05-31T00:00:00+00:00https://florat.net/benefits-of-hardware-based-full-disk-encryption-and-sedutil/<p>We need to protect our personal or professional data, especially when located on
laptops that can easily be stolen. Even if it is not yet fully
widespread, many companies or personal users encrypt their disks to
prevent such issues.</p>
<p>They are three major technologies to encrypt the data (most of the time,
the same symmetric cipher is used:AES 128 or 256 bits) :</p>
<ul>
<li>Files-level encryption tools (7zip, GnuPG, openSSL...) where we
encrypt one or more files (but not a full file system)</li>
<li>Software FDE = Full Disk Encryption (dm-crypt, encfs, TrueCrypt
under Linux ; BitLocker, SafeGuard under MS Windows among many
others) where a full file system is encrypted. Most of these
softwares map a real encrypted file system to a in-memory clear
filesystem. For instance, you open an encrypted /dev/sda2 filesystem
with dm-crypt/Luks this way :</li>
</ul>
<pre><code> sudo cryptsetup luksOpen /dev/sda2 aClearFileSystemName
<enter password>
mount /dev/mapper/aClearFileSystemName /mnt/myMountPoint
</code></pre>
<ul>
<li>Hardware-based Full Disk Encryption (also named SED =
Self-Encrypting Disk) where hard disk encrypt themselves in their
own build-in disk controller. We'll focus here on this technology.</li>
</ul>
<p>To make it work, you need :</p>
<ul>
<li>a SED-capable hard disk or SSD (I for one own a Samsung 840 PRO and
a 850 EVO that support it, most professional disks do).</li>
<li>a compatible BIOS that support SED. You can then set a disk-level
user password in the BIOS (and optionally an administrator password
to unlock the user password). When the computer boots, the BIOS asks
interactively for a disk password [1]. Note that many BIOS
(especially on desktops or on non-professional laptops) doesn't
support this feature because the constructor has not enable it
(maybe to avoid customer complaints about password loss ?).</li>
</ul>
<p>Once the correct BIOS disk password entered, the disk becomes totally
'open' (we say 'unlocked'), exactly like it has never been
encrypted. None software is involved afterward. It is important to
understand than a SED always encrypts the data. There is no way to
disable this behavior (however, it doesn't cause any significant effect
on the IO performance however because the IO volume is unchanged and
because the disk controller comes with a build-in AES chipset). The real
encryption key (MEK = Media Encryption Key) is located inside the disk
itself (but cannot be accessed). The user password (named KEK = Key
Encryption Key) is used to encrypt / decrypt the MEK. Keeping the disk
password unset is like keeping a safe open : the data is still encrypted
but decrypted when accessing the disk exactly as if none security system
ever existed. When you set the user password, you close the safe door
using your key. Note that there is no (known) way to recover a disk if
you loose your password : you not only loose your data but you also
loose your disk : it becomes a piece of junk from where none data can be
read or written to.</p>
<p>I used dm-crypt (the default FDE software under Linux) for my own laptop
until soon as I bought a SED-enabled Samsung SSD but I never managed to
use them on my own computer because my AMI BIOS doesn't support this
feature. The only option then was to use a software file system
encryption. This works but comes with several complications or drawbacks
:</p>
<ul>
<li>you need a /boot partition in clear to bootstrap the process. An
attacker can easily alter this partition and add keyloggers for
instance ;</li>
<li>you have to change some kernel options and make sure to set the
right modules loading order at startup or resume (ans keep them when
updating the kernel) ;</li>
<li>the TRIM SSD feature [2] is now supported by dm-crypt but it comes
with <a href="http://asalor.blogspot.nl/2011/08/trim-dm-crypt-problems.html">security
concerns</a>
;</li>
<li>you need dm-crypt commands on liveCD distros when performing system
backups.</li>
</ul>
<p>The only benefit of using software FDE I can think of is the possibility
to check the cipher code source (when using an open source solution like
dm-crypt of course). This is not the case of hardware encryption even if
none severe issue has been reported so far AFAIK.</p>
<p>SED hardware-based disks are much simpler to use in comparison :</p>
<ul>
<li>you only have to set a BIOS password and it's done !</li>
<li>you save a <a href="http://www.anandtech.com/show/2901/5">significant amount of CPU
usage</a> ;</li>
<li>it is possible to destroy definitively a drive by changing its
password once for all when decommissioning a laptop for instance
(but it is also a drawback when the password is lost
unintentionally).</li>
</ul>
<p>But :</p>
<ul>
<li>once unlocked, the disk remains in this state while the computer is
powered (this include while suspended on RAM). Login window doesn't
change anything : an attacker can read the drive by plugging
directly to the SATA port (DMA attack) and even worse ; [a warm
reboot (a restart) keeps the drive open !]{.ul} It means that one
can access the unlocked disk simply by inserting a Live CD/USB and
rebooting the computer. The Live CD/USB is booted and all the drive
data is <a href="https://www1.cs.fau.de/sed">available when mounted</a> ! ;
This is why, when using SED, [you should always hibernate]{.ul}
(suspend-on disk) instead of suspending on RAM : when hibernating,
the drive actually loses power and is locked again. Of course,
you'll get the same effect when turning off your computer.</li>
<li>you need a SED-capable BIOS. Note that you can also use the hdparm
command to unlock a SED drive but it requires to boot a Live CD/USB.
Then launch something like the command bellow and then restart your
computer. However, it is not actually practicable ;</li>
</ul>
<pre><code> sudo hdparm --user-master u --security-set-pass 'pass' /dev/sdb
</code></pre>
<ul>
<li>if you loose the disk password, the disk is simply dead (but is may
be a benefit as stated before) ;</li>
<li>you may depend of a special BIOS manufacturer because it trims or
hash the disk password (KEK). Another BIOS may use another
algorithm. It means that moving a drive from a computer to another
may lead to be unable to unlock the drive, even with the same
password.</li>
<li>because the operating system and its settings is not yet booted,
only the QUERTY keyboard layout is available, you have to keep this
in mind when choosing and typing it ;</li>
<li>you have to trust the hardware security chipsets.</li>
</ul>
<p>The <a href="https://en.wikipedia.org/wiki/Opal_Storage_Specification">OPAL
specification</a>
published by the Trusted Computing Group (AMD, IBM, Intel, HP...) fixes
some of these issues :</p>
<ul>
<li>you can always save the disk when loosing the disk password (of
course, data is still lost, fortunately) thanks the PSID Revert
function (the PSID is a number printed on the disk proving than you
can physically access the drive) ;</li>
<li>the KEK hashing and triming is now standard : the same drive could
be moved from a computer to another :</li>
<li>you can use SED even without BIOS support because OPAL comes with a
mechanism called 'shadow MBR'. Basically, you flash a mini-OS (the
PBA = Pre-Boot Authorization) up to 128MB to a dedicated area of the
disk. This OS is provided to the BIOS when booting. A password
window is then displayed. If the password is correct, the real MBR
of the drive (the Master Boot Record = boot code) is then decrypted
and executed. No more need for BIOS SED support and even better : a
new open source OPAL implementation (sedutil) is available and its
code source can be reviewed much more easilly than the BIOS binary
firmware.</li>
</ul>
<p>The new <a href="https://github.com/Drive-Trust-Alliance/sedutil">sedutil
project</a> comes with :</p>
<ul>
<li>some PBA images ready to flash to the drive</li>
<li>the sedutil-cli command to administer the OPAL disk (setting up a
drive in OPAL configuration, changing the password, PSID revert...)
. Note that these commands requires to set libata.allow_tpm=1 to the
kernel flags if run from an installed Linux. You can also, like me,
use sedutil-cli from a rescue image booted from USB. See the <a href="https://github.com/Drive-Trust-Alliance/sedutil/wiki/Command-Syntax">list
of
commands</a>.
See also how to <a href="https://github.com/Drive-Trust-Alliance/sedutil/wiki/Encrypting-your-drive">Setup a
drive</a>.</li>
</ul>
<p>This worked perfectly for me and I now use my Samsung 850 EVO drive in
SED OPAL mode. Note that sedutil doesn't support suspend on RAM (when
resuming, the drive is as if it was dead, you'll get IO errors all over
the place). Always use hibernation instead (as I already stated, it's
the only safe way to use SED drives anyway).</p>
<p>[1] Note that it has nothing to do with the main BIOS user password
that "protect" your machine (then your disk data is still in clear and
can be read simply by moving it to another computer or by removing the
BIOS battery)</p>
<p>[2] TRIM is used for SSD to free ASAP unused blocks and increase the
disk lifespan.</p>
One month with Ansible2018-02-03T00:00:00+00:00https://florat.net/one-month-with-ansible/<p><a href="https://github.com/ansible/ansible">Ansible</a> is an Open Source IT
automation tool written in python and sponsored by RedHat. Best known
alternatives are Puppet, Chef and Salt.</p>
<p>I used Ansible for the first time (2.4.3, last release in early 2018) in
an attempt to produce some quite sophisticated Docker Swarm
docker-compose files and others yaml configuration files that includes a
significant volume of logic (port number increments, conditional
suffixes, variable number of sections according to lists of items, etc.)</p>
<p>I achieved my goals in about five or six days of effective work,
including the reading of most of the official manual. Be able to achieve
such a real task in six days is acceptable when we have to learn it
first but I think I would have made it in a single day in bash (that I
already know). However, Ansible is much more powerful. My first contacts
and real works with Ansible were really enjoyable and I was very
surprised to make it work so easily. I also tried to apply all the
documented best practices with success. Sadly, I spent the last three
days struggling with the last 5% of remaining work, dealing with
limitations/bugs that I found hard to understand and quite irritating.</p>
<h3>What I liked</h3>
<ul>
<li>
<p>The concept of desired state is very powerful: Ansible playbooks
(list of tasks to performed against some servers) are idempotent :
only the final states have to be described (like " a /tmp/foo
directory with 600 rights), not the actions required to reach it
(like in bash : mkdir, chown, chmod...). It's powerful partially
because you don't have to test existence of the final state (in a
bash in exit on error mode, you would have to check existence of
each directory for instance).</p>
</li>
<li>
<p>Ansible is agentless : nothing to install on targeted servers. All
you need is an ssh key exchange to allow the headless ssh
connections. Ansible generates python scripts from the playbook,
copy them using scp or sftp and run them remotely using ssh as well.</p>
</li>
<li>
<p>The role concept is a kind of operation process packaged (like "add
a mysql user" or "create and configure an Apache server"). It
enables a lot of reuse and is really great. A marketplace of shared
roles is available on <a href="https://galaxy.ansible.com/">Galaxy</a>.</p>
</li>
<li>
<p>The manual and reference documentation is good and extensive.</p>
</li>
</ul>
<h3>What I found irritating</h3>
<p>UPDATE November 2019 : all of the issues described here has been
resolved in the mean time by the Ansible team, KUTGW !</p>
<ul>
<li>
<p>I don't like yaml for complex structures. I find it harder to read
than json and syntax errors are very frequent and occur a great
waste of time. The data structures are described by (space)
indentation I found brittle. Worst : different indentation forms can
be both valid but mean different things (like a map of map or one
more key/value for the current map). Validators exist but AFAIK,
formatters doesn't. However, yaml comes with fine features like
comments or multi-documents.</p>
</li>
<li>
<p>Playbooks execution is rather slow because of a new ssh connection
for each task + one for the generated python scripts sending to
remote host. Note however that even if tasks are always executed
sequentially, the tasks are run in parallel against all the targeted
servers.</p>
</li>
<li>
<p>You need to create a playbook that just wrap a role to run it, you
cannot launch a role directly from command line</p>
</li>
<li>
<p>There are <a href="http://docs.ansible.com/ansible/latest/playbooks_loops.html">16 kinds of loops in
Ansible</a>
like with_fileglob or with_filetree. Is it really necessary ?</p>
</li>
<li>
<p>I wasn't able to increment a variable inside a loop in a jinja2
templates : <a href="https://github.com/pallets/jinja/issues/641">https://github.com/pallets/jinja/issues/641</a> . This is
a feature, not a bug. Incrementing things (like ports) is
nevertheless a very basic requirement IMO. Hopefully, there is a
workaround (using a list, append and pop).</p>
</li>
<li>
<p>It isn't possible to match a directory with with_fileglob :
<a href="https://github.com/ansible/ansible/issues/17136">https://github.com/ansible/ansible/issues/17136</a>. You have to use
with_filetree that comes with other constraints.</p>
</li>
<li>
<p>It is difficult to debug the templating, especially when using
templates fragments (with import). On any template module error, you
only get the playbook line and the full template content (very
difficuly to read BTW).</p>
</li>
<li>
<p>I find the syntax sometimes twisted, like when we have to use
doubles quotes around variables and sometimes not. Also, why should
we add white space around the variables names ? (like ).
I find this ugly and annoying. Apparently, we can drop the spaces in
playbooks but not in the jinja2 templates...</p>
</li>
<li>
<p>Ansible is not compatible with python 3.0 to 3.5. Sometimes (like
with the copy module), I didn't get any error message despite the
fact that the python package on the target server was unsupported.</p>
</li>
<li>
<p>It is not possible to copy recursively with src_remote
(<a href="https://github.com/ansible/ansible/issues/14131">https://github.com/ansible/ansible/issues/14131</a>). I had to use a
hack (run template on the Ansible host using connection: local ) and
then to copy using src instead of src_remote.</p>
</li>
</ul>
<h3>Final thoughts</h3>
<p>As a conclusion, Ansible is a good product but can become cumbersome
when trying to make it run too much logic. It is mainly a declarative
system, not imperative. Next time, we'll have a look at salt, it may be
a more suitable solution, or maybe not ?</p>
Dashboard under XFCE real howto2016-10-30T00:00:00+00:00https://florat.net/dashboard-under-xfce-real-howto/<p>If like me you like both XFCE and Gnome-Shell dashboard/ window picker,
here's how I configured my desktop for the nearest most Gnome-like
experience :</p>
<p>1) Install xfdashboard (the dashboard itself). I ised version 0. Note :
this release comes with a hot corner plugin, no more need to use xdotool
or brightside.</p>
<p>2) Add or enable these commands to be run at X startup (in XFCE
Settings / Sessions and startup / application autostart ) :
<code>xfdashboard -d</code> (deamon mode for a faster display)</p>
<p>3) Configure XFdashboard using <code>xfdashboard-settings</code> :</p>
<ul>
<li>In 'plugins', select the 'hotcorners' plugin</li>
<li>make sure to restart xfdashboard to enable this new plugin :
<code>xfdashboard -q</code>, then <code>xfdashboard -d &</code></li>
</ul>
<p>4) Add the preferred applications into the vertical side bar (no GUI,
xfce4-settings-editor cannot edit arrays), here's a sample command :</p>
<pre><code>xfconf-query -c xfdashboard -p /favourites -n -t string -s "exo-file-manager.desktop" -t string -s "exo-terminal-emulator.desktop" -t string -s "jetbrains-idea-ce.desktop" -t string -s "owncloud.desktop" -t string -s "simple-scan.desktop" -t string -s "gnome-calculator.desktop" -t string -s "firefox.desktop" -t string -s "thunderbird.desktop" -t string -s "zim.desktop" -t string -s "libreoffice-writer.desktop"
</code></pre>
<p>5) If you are in multi-monitors mode and you want to see all windows on
the primary display and not spread on several monitors, see <a href="https://github.com/gmc-holle/xfdashboard/issues/136">my
workaround</a> : in
<code>/usr/share/themes/xfdashboard/xfdashboard-1.0/xfdashboard.css</code> (or in
the others themes <code>xfdashboard.css</code> files) , change
<code>filter-monitor-windows: true;</code> to <code>filter-monitor-windows: false;</code></p>
The IT crowd, entropy killers2016-07-16T00:00:00+00:00https://florat.net/the-it-crowd-entropy-killers/<p><img src="https://florat.net/assets/images/blog-tech/it_crowd.png" alt="http://www.channel4.com/programmes/the-it-crowd"></p>
<p>I once asked myself "how to define our job in the most general sense of
the term, we, computer scientists ?".</p>
<p>Our fields are very diverse but according to me, <strong>the greatest common
divisor is "entropy hunter"</strong>.</p>
<p>All we do have is geared toward the same goal : decrease the level of
complexity of a system by modeling it and transforming a bunch of
semi-subjective rule into a Turing machine program that can't execute
the indecisive.</p>
<p>Everything we do, including documentations, workshops with the
stakeholders, project aspects, and not only the programming activities
should be about chasing doubt. Every word, every single line of code
should kill ambiguity.</p>
<p>Take design activities : most of human thoughts are fuzzy. This is the
reason why waterfall (traditional) project management processes where
all designs are done in one go can't work : the humans need to see
something to project themselves using it and go further in their
understanding.</p>
<p>Business designs are subjective in many ways, for instance :</p>
<ul>
<li>by describing missing cases (or less often, unexisting cases)</li>
<li>by words ambiguity. Here's a small anecdote : last week, I worked
on a specification document written in French with the word :
"soit" : "the file contains two kinds of data, soit data1 and
data2". This sentence could be understood in two opposite ways
because the French word "soit" means "either/or" but also
"i.e.". Hence, this sentence could mean at the same time "the
file contains data1 AND data2 kinds" or "the file contains data1
OR data2 types". I encounter this kind of uncertainty several times
a week.</li>
<li>by lacking of examples. The example are often much more demanding
and objectionable. They require a better understanding of the
system. Moreover, designing by the example (like in BDD) tend to be
more complete because when you start to provide nominal examples,
you are tempted to provide the corner case ones. (read <a href="https://www.manning.com/books/bdd-in-action">BDD in
Action</a> by John
Ferguson Smart for more).</li>
</ul>
<p>On the opposite, a program is deterministic. It is a more formal (and
modeled thus reduced) version of a complex reality. The more a reality
need cases and rules to be described entirely, the more the program is
complex but it is still much simpler than the reality it describes.</p>
<p>The quality of all we do should IMO be measured in the light of the
amount of complexity we put into our programs. The less complexity we
used to model a system, the better a program is.</p>
Programming is craftsmanship and requires skills2016-06-12T00:00:00+00:00https://florat.net/programming-is-craftsmanship-and-requires-skills/<p>Many managers think that programming is easy, it's just a bunch of
<code>for</code>, <code>if</code>, <code>else</code> and <code>switch</code> clauses after all, isn't it ?</p>
<p><strong>But coding is difficult because it is mainly about TAKING DECISIONS
ALL THE TIME</strong>.</p>
<p>Driving is easy because you don't have to take decisions about the way
to turn the steering wheel; walking is easy, you don't even have to
think about it; Drilling a 10 mm hole into a wall is easy because the
goal is clear and because you don't have many options to achieve it...</p>
<p>Software is difficult and is craftsmanship because there are always many
ways to achieve the same task. Take the simplest example I can think
about : an addition function : we want to add <code>a</code> and <code>b</code> to get
<code>c=a+b</code>.</p>
<p>* Should I code this the object-oriented way ( <code>a.add(b)</code> ) or the
procedural way ( <code>add(a,b)</code> ) ?</p>
<p>* How should I name this ? <code>add()</code> ? <code>sum()</code> ? How should I name the
arguments ?</p>
<p>* How should I document the function ? is there some project
conventions about it ?</p>
<p>* Should I return the sum or store it into the object itself ?</p>
<p>* Should I code this test first (TDD) ? write an UT afterwards or write
no test at all ?</p>
<p>* Does my code scale well ? does it use a lot of memory ?</p>
<p>* Which visibility for this function ? private, public, package ?</p>
<p>* Should I handle exceptions (a is null for instance) or from the
caller ?</p>
<p>* Should the arguments be immutable ?</p>
<p>* Is it thread-safe ?</p>
<p>* Should this function be injected within an utility class ?</p>
<p>* If I'm coding in object oriented, is it
<a href="http://en.wikipedia.org/wiki/SOLID_%28object-oriented_design%29">SOLID</a>
compliant ? what about inheritance ? ...</p>
<p>* ... tens of others questions any good coder should ask to himself</p>
<p>If all of this decisions could be taken by a machine, coders would not
be required at all because we would just generate code (and we sometimes
do it using MDD technologies, mainly for code skeletons with low added
value).</p>
<p>We -coders- would then all be searching for a new job. But, AFAIK, this
is not the case, we are still needed, still relevant. All companies
still need costly software craftsmen !</p>
<p>Q.E.D. ;-)</p>
<p>I can't agree more with the <a href="http://manifesto.softwarecraftsmanship.org/">manifesto for software
craftsmanship</a>.</p>
Deployment scripts should always be refreshed from VCS prior execution2016-06-12T00:00:00+00:00https://florat.net/deployment-scripts-should-always-be-refreshed-from-vcs-prior-execution/<p>After few months of continuous deployment scripts writing for a pretty
complex architecture (two JBoss instances, a mule ESB instance, one
database to reset, a BPM server, each being restarted in the right order
and running from different servers), I figured out a good practice in
this field : scripts have to be auto-updated.</p>
<p>When dealing with highly distributed architectures, you need to install
this kind of deployment script (mostly Bash) on every involved node and
it becomes soon very cumbersome and error prone to maintain them on
every server.</p>
<p>We now commit them into a VCS (Subversion in our case), it is the master
location of the scripts. Then, we try :</p>
<ol>
<li>To checkout them before running when possible. For instance, we used
a Jenkins job to launch our deployment process (written as a bash
script). The job is parametrized to checkout the SVN repository for
the script before running it from the Jenkins workspace. This is
very convenient.</li>
<li>When this is not possible (for instance when the script should be
executed on another server than the CI server), we checkout the
script from the Jenkins server and push them (using scp for
instance) to targeted server before executing it (using ssh).</li>
<li>Sometimes, when the call must be asynchronous on another server, we
simply trigger a script by creating remotely an empty file. A very
simple croned bootstrap script (not refreshed itself) detect the
file change, update the script (svn co) and run it.</li>
</ol>
Retours Eclipse DemoCamp 2015 Nantes2015-05-31T00:00:00+00:00https://florat.net/retours-eclipse-democamp-2015-nantes/<p>J'ai eu le plaisir de me rendre à l'Eclipse DemoCamp Nantes jeudi
dernier au Hub Creatic (il est difficile à trouver car pas encore
indiqué, c'est le bâtiment jaune vif à coté de l'école Polytech
Nantes. C'était la première fois que je m'y rendais et je dois dire
que j'ai été impressionné, dommage qu'il ne soit pas en centre ville).</p>
<p>Nous avons eu un panorama extrêmement éclectique mais passionnant du
monde Eclipse en 2015, de l'internet des objets (IOT) à l'usine
logicielle de grands groupes en passant par l'informatique pour les
enfants. Ceci montre, si besoin était, la force de traction du monde
Eclipse en tant qu'IDE bien sûr mais surtout en tant que plate forme.</p>
<h4>Gaël Blondelle de la fondation Eclipse</h4>
<p>l'a très bien expliqué : la force d'Eclipse est avant tout sa capacité
fédératrice : la version Luna a été réalisée par 400 développeurs issus
de 40 sociétés différentes.</p>
<p>La notion de release train (livraison simultanée et annuelle de tous les
projets en juin) assure une stabilité et une intégration de qualité
entre les centaines de plugins.</p>
<p>Une notion émergente concerne également les Working groups regroupant
des travaux par thème comme :</p>
<ul>
<li>
<p><a href="https://www.locationtech.org/">LocationTech</a> orienté SIG . Un des
projets les plus innovants de ce groupe est <a href="https://www.locationtech.org/proposals/mobile-map-tools">Mobile
Map</a>
générant des cartes directement calculées sur le smartphone.</p>
</li>
<li>
<p>IOT fédérant les projets autour de l'internet des objets. Deux
projets intéressants : <a href="https://eclipse.org/smarthome/">Eclipse Smart
Home</a> pour la domotique et <a href="http://eclipse.org/eclipsescada/index.html">Eclipse
SCADA</a> proposant des
librairies et outils SCADA (Supervisory Control and Data
Acquisition) servant au monitoring de nombreux hardwares.</p>
</li>
<li>
<p><a href="https://science.eclipse.or/">Eclipse Science</a> pour des projets de
visualisation ou de traitements scientifiques.</p>
</li>
<li>
<p><a href="https://science.eclipse.or/">PolarSys</a> regroupe des projets pilotés
par Thales, le CEA, Airbus, Ericson... pour les projets de
modélisation autour de l'embarqué (Papyrus SysML, Capella...).</p>
</li>
</ul>
<h4>Laurent Broudoux et Yann Guillerm, architectes au MMA</h4>
<p>nous ont ensuite exposé l'historique de déploiement et leur stratégie
multi-versions d'Eclipse. Leur DSI regroupe 800 personnes dont 150
utilisateurs d'Eclipse travaillant sur des projets aussi variés que du
legacy (Cobol, Flex, Java historique) à des projets plus novateurs
(mobile, applications Web à base de Grails...).</p>
<p>En résumé, la construction d'une nouvelle version de l'atelier (unique
jusqu'en 2012) prenait jusqu'à 50JH en partant d'Eclipse de base et
en intégrant/testant tous les plugins nécessaires. La nouvelle stratégie
se décline en deux axes :</p>
<ol>
<li>Construire un atelier modulaire en trois couches : 1) une base
(seed) : distribution Eclipse pré-packagée ; 2) des plugins
communautaires (Confluence, Mylyn, Subclipse...) 3) des plugins
maison principalement autour des outils de groupware Confuence.</li>
<li>Différentier les ateliers suivants les besoins (6 variantes, 4
familles) :
<ul>
<li>une usine legacy à base de Galileo</li>
<li>une usine « High Tech » pour les « usages » (CMS, mobile) basée
sur la distribution Grails GGTS (et bientôt intégrant les
technologies Android ADT) ;</li>
<li>une usine « cœur de métier » basée sur Juno (je n'ai pas noté
la distribution Eclipse utilisée comme seed) pour les
applications JEE (fournit les briques techniques de persistance
SQL et NoSql, la modélisation UML, les outils MDD (Acceleo,
ATD...), M2E pour l'intégration Maven... ;</li>
<li>une usine de modélisation d'architecture pour gérer le
patrimoine, les études d'impacts, la déclinaison des scénarios
projets... Cette modélisation d'entreprise se base sur un
Metamodèle dérivé de TOGAF. Cet atelier se base sur la seed
SmartEA (d'OBEO).</li>
</ul>
</li>
</ol>
<p>La stack technologique des usines s'appuie notamment sur Mylyn (gestion
des taches), Confluence (wiki d'entrerise), Maven, Chef pour le
Configuration Management, SVN comme VCS.</p>
<h4>Stéphane Bégaudeau d'OBEO</h4>
<p>nous a ensuite présenté les outils de développement et d'intégration de
l'écosystème NodeJS. Le scaffolding par archétypes se fait via l'outil
Yeoman . Le package manager des librairies JS est npm. On dispose
également des librairies/frameworks angular.js, ember.js et backbone.js.
Bower est un gestionnaire de packet pour librairies JS. Le build se fait
soit avec Grunt (modèle configuration over code) ou (préféré), le plus
récent Gulp (code over configuration) plus simple. Min.js assure des
fonctions de minification du code. Pour les tests, on dispose de Jasmine
(BDD), Mocha, Qunit. PhantomJS et CasperJS permettent des tests
headless. Istanbul assure l'analyse de la couverture de code. JSHint
effectue les tests de style. Karma teste l'ubiquité des pages
(responsive design). Pour finir, Stéphane nous présente Eclipse Orion,
l'IDE Eclipse Web basé sur NodeJS. Cet IDE assure entre autres la
complétion du code, la coloration syntaxique, vient avec un très bon
support de Git et peut être étendu par plugins.</p>
<h4>Hugo Brunelière d'Atlanmod</h4>
<p>nous a fait découvrir le programme de recherche ARTIST proposant des
outils d'ingénierie des modèles et des méthodologies pour migrer une
application traditionnelle en application cloud-friendly. Le programme
de 10M€ est développé principalement par l'INRIA et ATOS (Spain). Le
programme propose :</p>
<ul>
<li>De la méthodologie via un handbook, un modèle de certification.</li>
<li>Des outils d'analyse de faisabilité métier et technique, de la
rétro-ingénierie, des outils d'optimisation.</li>
</ul>
<p>La modélisation est faite en UML stéréotypé sous Enterprise Architect
principalement. Des DSL textuels à base de XText sont également utilisés
ainsi que des DSL graphiques à base de SIRIUS. L'analyse M2T est faite
via Modisco et la transformation de modèle M2M en ATL. Le reporting se
base sur BIRT. La méthodologie est outillée par EPF (Eclipse Process
Framework). Un modèle de maturité cloud-friendly a été développé : le
modèle
<a href="http://www.artist-project.eu/content/maturity-assessment-tool-mat">MAT</a>
(Maturity Assessment Tool).</p>
<h4>Stévan Le Meur de Codenevy</h4>
<p>nous a fait une démonstration de <a href="https://projects.eclipse.org/proposals/flare">Eclipse
CHE</a> , une plate-forme
SaaS pour les développeurs basée sur Orion et Docker. Un poste de
développement peut être très facilement provisionné puis "déployé" en
Web pur (c'est la notion de « Factory Codenevy »). Il est possible de
sélectionner puis d'exécuter des containers Docker faisant tourner des
SA Tomcat, Jboss ou autre en local ou à distance. Une nouvelle
fonctionnalité d'intégration GitHub en avant première (clone puis pull
request en quelques clics sans rien installer) a fini de nous bluffer.</p>
<h4>Maxime Porhel d'OBEO</h4>
<p>nous a présenté un <a href="https://github.com/mbats/arduino">environnement de programmation
graphique</a> pour les cartes Arduino et
à destination des enfants. Ce DSL graphique très simple a bien sûr été
développé en SIRIUS (la version Open Source d'OBEO Designer). Une
démonstration très rigolote a prouvé le concept sur une carte Arduino
AVR inclus dans un kit <a href="http://www.dfrobot.com/index.php">DFRobots</a>. Ce
sont mes enfants qui vont être contents :-)</p>
<p>Enfin,</p>
<h4>Fred Rivard, fondateur de IS2T</h4>
<p>nous a expliqué les enjeux économiques et technologiques du Java
embarqué. 100Md de micro-contrôleurs de 1 à 15$ sont actuellement
déployées au niveau mondial. 25 % tournent en environnements « balisés »
: iOS, Android et Linux. Le reste est est extrêmement éclaté en
centaines de technologies sur lesquelles on programme encore en
assembleur. Le ticket de départ d'un projet se chiffre à 1M€ minimum et
il faut sortir le produit en moins de six mois pour être rentable vis à
vis de la concurrence. Le Big Data ne pourra se développer
harmonieusement que si le Little data (les devices, l'IOT) qui
l'alimente devient plus économique. IS2T vise à développer des JVM
embarquées extrêmement rapides et légères en mémoire (rentable en terme
de mémoire à partir de 100K de mémoire flash vis à vis de code
classique). Toutes ces technologies sont regroupées autour de la
plate-forme MicroEJ). IS2T développe également un « store »
d'applications embarquées pour ce type de hardware. Fred nous a
présenté de façon ludique de nombreux exemples d'utilisation comme
cette montre connectée qui s'allume en 48ms alors qu'il faut 500ms
pour soulever son bras pour lire l'heure : la montre peut donc être
arrêtée le plus clair de son temps, son autonomie est en décuplée.</p>
<h4>A noter quelques apartés en séance</h4>
<p>sur <a href="https://wiki.eclipse.org/Eclipse_Oomph_Installer">Oomph</a>, un nouvel
installer pour les plugins Eclipse permettant également de centraliser
le paramétrage des développeurs.</p>
Retour sur l'Agile Tour 2014 Nantes2014-10-15T00:00:00+00:00https://florat.net/retour-sur-l'agile-tour-2014-nantes/<p>J'ai eu la chance d'assister à la journée <a href="http://www.agilenantes.org/wp-content/uploads/2014/11/AGT2014_livret_sessions.pdf">Agile Tour
2014</a>,
version nantaise, à l'école des mines. Bien organisé, riche en
rencontres et retours d'expériences, comme tous les ans...</p>
<p><img src="https://florat.net/assets/images/blog-tech/img_4308.jpg" alt=""></p>
<h4>Les world cafés</h4>
<p>Une innovation intéressante cette année : les 'World cafés' entre les
conférences et pendant lesquels un sujet est discuté par un groupe
éphémère et dont un seul membre (le scribe) reste pour consolider les
idées qui sont ensuite présentées. Concept favorisant les échanges entre
les participants. A cette occasion, j'ai notamment pu échanger avec la
responsable d'une grande mutuelle qui m'expliquait qu'elle avait du
mal à trouver des prestations de MCO agiles alors que de notre coté,
nous avions encore du mal à trouver des clients prêts à partir en (vrai)
agile en mettant en front du projet un PO (Product Owner) disposant de
pouvoir de décision, d'une expertise fonctionnelle et de temps pour
s'investir sur son projet.</p>
<h4>Comment impliquer vos clients dans leurs projets ?</h4>
<p>J'ai tout simplement adoré cette
<a href="http://www.slideshare.net/atnantes/agile-tour-nantes-2014-comment-impliquer-vos-clients-dans-leurs-projets?ref=http://www.slideshare.net/slideshow/embed_code/42646053">conférence</a>
très concrète et profonde à la fois. Benoit Charles-Lavauzelle (CEO de
Theodo) et Julien Laure (coach agile, scrum master) présentent
l'histoire de leur société et comment ils sortent des projets
(maintenant) réussis en scrum. La société qui développait des projets au
forfait (sites B2B en PHP/Symfony) a été proche du dépôt de bilan en
2011. L'insatisfaction des clients était forte à cause de l'effet
tunnel : une fois terminées, les applications ne correspondaient pas au
besoin que le client pensait avoir exprimé. La société s'est alors
tourné vers la méthode scrum qu'elle a appliqué <em>by the book</em>. L'échec
a été grand et la cause peut sembler évidente <em>a posteriori</em> : il n'y
avait pas de PO du coté du client, donc pas d'implication. Sans PO, le
projet navigue à vue. La société a décidé en 2013 ne ne plus faire que
des projets en scrum avec implication forte du client. Malgré de fortes
réticences des clients qui ne voulaient être facturées au temps passé et
non plus au forfait, la société a vu son CA passer alors de 1.2 à 5M€
cette année. Les clients sont venus pour l'expertise technique en
PHP/Symfony et sont restés pour la qualité et le respect des délais (95%
des clients recommandent la société).</p>
<h5>Comment Theodo a-t-elle réussi à impliquer le client ?</h5>
<ul>
<li>D'abord, rassurer le client : l'inviter aux plannings de sprint,
estimer avec lui (en pocker planning) pour qu'il se rende compte
des difficultés techniques. Faire des sprints courts (une semaine
ici).</li>
<li>Etre transparent, Theodo suit précisement chaque écart au standard
(voir
<a href="http://www.slideshare.net/atnantes/agile-tour-nantes-2014-comment-impliquer-vos-clients-dans-leurs-projets?ref=http://www.slideshare.net/slideshow/embed_code/42646053">support</a>
P28).</li>
<li>Burdowncharts visibles par le client en live via outils Web.</li>
</ul>
<p>Qu'est ce qu'on bon PO ?</p>
<ul>
<li>Il faut choisir le PO qui porte (vraiment) le projet, possède le
pouvoir de décision (attention aux erreurs de casting).</li>
<li>Il faut du feedback permanent avec le PO : système d'évaluation
hebdomadaire et portant sur la vélocité et l'accompagnement.</li>
</ul>
<h5>Comment faire valider le PO ?</h5>
<ul>
<li>Board électronique avec les taches à valider :
<a href="https://trello.com/">Trello</a> (très simple à utiliser pour le
client).</li>
<li>e-mail quotidien en mode digest avec toutes les questions en
suspens, URL importantes, n+1 en copie. Envoyé après le daily.</li>
<li>Une fiche d'auto-eval agile (voir support P44) permet d'évaluer la
qualité "technique" du sprint et d'arbitrer entre le court et le
long terme.</li>
</ul>
<h5>Bilan</h5>
<ul>
<li>Le PO travaille de un à deux jours par semaine avec l'équipe, ce
n'est pas de trop !</li>
<li>Un nouveau problème émerge avec les grands comptes : la distance
avec le PO et la généralisation des proxy-PO représentant du PO coté
prestataire. Un proxy-PO, c'est mieux que rien (mais à peine
mieux).</li>
</ul>
<h4>L'Intelligence collective au service de l'innovation et de l'industrialisation</h4>
<p>Clément Duport (Alyotech) nous fait part de sa vision de l'innovation.
Il explique que le nœud gordien des politiques IT actuelles réside en ce
domaine dans l'ambivalence entre la créativité, le risque, la liberté
du coté de l'innovation et l'harmonisation, le contrôle, l'ordre du
coté de industrialisation. Ceci conduit à une vraie schizophrénie (OK,
nous avons chez Capgemini le Lab'Innovation qui résout en partie ce
dilemme en proposant cet espace d'innovation à nos clients). En fait,
il explique qu'il faut les deux pour avancer, il faut trouver la bon
niveau entre l'ordre (pour survivre) et le désordre (pour avancer).
"Créer, c'est se souvenir de ce qui n'a pas eu lieu" (Siri
Hustvedt). L'innovation peut émerger d'une démarche industrielle, par
recombinaison d'idées.</p>
<h4>Faire de la conception en équipe sans architecte</h4>
<p>Ly-Jia Goldstein nous fait part de son expérience de développeuse en
équipe suivant les préceptes du <a href="http://manifesto.softwarecraftsmanship.org/">software
craftmanship</a> et de l'XP.
Elle explique qu'un bon processus de développement en XP et s'appuyant
sur le BDD, le tout en responsabilisant au maximum les membres de
l'équipe (en instaurant des decisions techniques collégiales) pouvait se
passer d'architecte (logiciel). Ceci présente de nombreux avantages
comme un meilleur <a href="http://en.wikipedia.org/wiki/Bus_factor">bus factor</a>,
une plus grande réactivité projet et une meilleur fluidité du
refactoring. De bons points ont été soulevés. Néanmoins, la conférence
tournait de mon point de vue autour du rôle d'architecte logiciel
uniquement. Il me semble qu'un cadre d'architecture général
(urbanisation, architecture technique, catalogue de solutions, cadre
industrialisé, PIC) soit incontournable dans les grands SI, même s'il
est vrai que les équipes, constituées en grande partie d'ingénieurs,
gagneraient à être plus proactives sur un plan logiciel et éviter des
situations telles que celle-ci :</p>
<p><img src="https://florat.net/assets/images/blog-tech/are-you-too-busy-to-improve2.png" alt=""></p>
My cloud, my way2014-08-25T00:00:00+00:00https://florat.net/my-cloud-my-way/<p>I just finished to setup my personal cloud storage. It has been a long
and difficult task and I'd like to share with people with similar
requirements a bunch of useful information and pointers that would have
save me a lot of time.</p>
<p><strong>Summary diagram</strong></p>
<p><img src="https://florat.net/assets/images/blog-tech/my_cloud.png" alt=""></p>
<p><em>Orange: HTTPS stream; Green: synchronization stream; Blue: Webdav
stream; Red: security system</em></p>
<p><strong>My requirements</strong></p>
<ul>
<li>Safe : strongly encrypted storage for data and backups, encrypted communications, easy to backup and restore. Client-side encryption is optional.</li>
<li>Ecological : reduced footprint, especially when dealing with the energy.</li>
<li>Cheap : free or very low price for large amount of storage space (200 GB to 1 TB).</li>
<li>Open : should run under the three main operating systems (Linux, Windows, OSX) ; HTTP proxy compliant; Available from anywhere using a simple web browser.</li>
<li>Fast : I mean less than 10 minutes to detect changes from my 110 GB / 90K files. Low CPU consumption on the client side and on the server side appreciated.</li>
</ul>
<p><strong>Kinds of files in the cloud storage</strong></p>
<p>Emerging file usage patterns I identified for me so far are :</p>
<ul>
<li>Exchange" : temporary storage to easily share files between computers. Synchronous writing. I use this typically when leaving the office to upload a document I want to work on at home from another computer and I want to make sure that the file is immediately uploaded into the cloud without having to wait for the next synchronization. Note that would be largely useless if I kept my computers online but I suspend them to save energy.</li>
<li>"Pure cloud" : primary source is the cloud. Can be read/written from any node but the preferred node in case of conflict is the cloud itself. I use it for few TODO notes that should be available from anywhere. The synchronization can be asynchronous.</li>
<li>"Archive" : same than "Pure cloud" but for archiving purpose only, few writes, few reads, files to kept. I use this to save some backups.</li>
<li>"Unidirectional copy" : asynchronous copy of a directory into another node for read-only when off-line. I use this to get a copy of some directories located on the cloud only but sometimes required when offline (for instance I want on my office laptop a read-only snapshot of my personal notes uploaded from my personal laptop).</li>
<li>"Unidirectional sync" : a directory is primary on a node (this node is preferred in case of conflict) and is asynchronously synchronized into the cloud and then possibly other nodes. The directory can be written only on the primary node. This is the main pattern I use for most of my data.</li>
<li>"Bidirectional sync" : Shared directory between several nodes. Any node can read or write. I don't use this mode because my experience showed that it comes at the cost of numerous conflicts : if you have to edit files from an offline computers (on the train for instance), you quickly get conflicts. It is often too late to properly reconsiliate them when you figured out the problem. I prefer to use the "Pure cloud" pattern for files that can be written by several nodes. In the "Pure cloud" pattern, however, you can only access these files read-only when offline because they will be overridden by the cloud version at the next synchronization.</li>
</ul>
<h2>The different streams of the infrastructure</h2>
<h3>HTTPS using a browser</h3>
<ul>
<li>Typical use case : I'm traveling and I want to watch/show a picture / an administrative asset etc.</li>
<li>Usage frequency : low</li>
<li>From where ? anywhere on the planet</li>
<li>Requirements : a browser and a login/password</li>
<li>Modalities : read-only, the files are browsed using the default Apache tree explorer.</li>
<li>My experience : the navigation is so fast (even on my CubieBoard and my pretty low upload bandwidth) that I find this useful to find a document even from home.</li>
</ul>
<h3>Remote filesystem mount point</h3>
<ul>
<li>Typical use case :
<ul>
<li>Copying some files to backup, when I want to get sure to upload a file into the cloud without waiting for the next scheduled sync (when leaving office for instance)</li>
<li>Performing filesystem operations against the mount point (count files, check size recursively, remove directories...)</li>
<li>Editing a note file located on the cloud.</li>
</ul>
</li>
<li>Usage frequency : mounted at startup, pretty low effective usage (once or twice a day)</li>
<li>From where ? office, home</li>
<li>Requirements : a mounting software (I use davfs2)</li>
<li>Modalities : works well even through a HTTP proxy. It works using a cache by design so the local and the remote files may not be different during a period of time, never use this for a synchronization (using sync or unison for instance) because it doesn't preserve time, see below "Note about Webdav".</li>
<li>My experience : OK if you only use it for occasional use cases described previously. Comes with a significant latency that increase the time of the 'df' commands for instance. I plan to mount it only on demand and stop to mount it automatically at startup.</li>
</ul>
<h3>Local access to synchronized files</h3>
<ul>
<li>Typical use cases : doing real work (like development) at home or office that can't afford low latencies when saving files.</li>
<li>From where ? home, office.</li>
<li>Usage frequency : always on in background.</li>
<li>Modalities : sync every 1h30, the full sync of the entire collection takes from one to two minutes. Only the cloud contain all the data : on my office computer, I only store professional projects files and I only synchronize them to the cloud, same for my home computer with the personal stuff.</li>
<li>My experience : works well but the merge/conflict priorities must be clear and forged into the sync commands. Never user bidirectional sync (see "Patterns : Kinds of files in the cloud storage") that can turn bad due to conflicts.</li>
</ul>
<h2>The solutions I tried during the last year</h2>
<ul>
<li><em>SparkleShare</em> : based on Git. As the website now states, it is good for small storage required (very good for that purpose) but Git is not designed for large binary storage so SparkleShare turns rapidly too slow to remain usable.</li>
<li><em>Wuala</em> : very good and clever, many features, client-side encryption but : 1) not open source so we have to trust them on the client-side encryption code about the fact that there is no backdoor included (difficult to believe nowadays ;-) ) 2) expensive.</li>
<li><em>Owncloud</em> : Pretty good, I now consider the release 5 as a serious solution, it meets all my criteria BUT is soooooo slow (on my CubieBoard, 1 Ghz ARM, SATA3 adapter)... Even when using a finely tunned MySql database (asynchronous IO among others things) instead of the packaged SQLite, it becomes very slow after few 10Ks of files mainly because of the high number of SQL queries it has to perform (not only when using the Web GUI but also when using the Webdav interface). The synchronization client 1.4 (for Seven and Ubuntu) is very slow (takes more than one hour to detect changes or fails in time out most of the time) and takes a significant amount of CPU (10 or 20%) even on powerful computers (i7, 4 cores). After a extensive use of Owncloud during several months I had to try another thing, too bad... I may give it another try in several years.</li>
<li><em>Hand-crafted</em> solution : I finally decided to solve the problem the Unix way, ie many small and powerful specialized tools chained one to the other and it finally works even better than excepted initially. See details bellow.</li>
</ul>
<h2>Not tested but not that far from my requirement</h2>
<ul>
<li>Client-side encryption with EncFS + Dropbox/Hubic/Google Drive or others free storage services. The main problem are 1) the cost of the storage, free plans provides only few GB 2) The web GUI are unusable because all directories and files names are encrypted. You'll find a lot of tutorials and blogs about this solution on the Web.</li>
<li>Seafile : Not tested because it is not compatible with HTTP proxies, looks promising on the paper.</li>
</ul>
<h2>Features I don't care about (but you may do)</h2>
<ul>
<li>Directories/files sharing /groupware features like concurrent editing : most of modern tools like Owncloud support this.</li>
<li>Version control (Owncloud is bundled with a plugin for that purpose). I still use a SCM (Git) for some directories (like source code or text notes) on the original source directory (and sometimes on the replicated locations) but I ignore the .git directories (which contain the local repository) so the source and the destination have their own local repository that doesn't collide (a git local repository is not intended to be shared among several computers)</li>
</ul>
<h2>Note about Webdav</h2>
<ul>
<li>Webdav is an ancient technology re-emerging thanks to the cloud storage trend, most cloud providers comes with a Webdav connectivity.</li>
<li>The good
<ul>
<li>
<p>It is based upon HTTP so HTTP-proxy compliant out of the box.</p>
</li>
<li>
<p>A distant Webdav service can be mounted under Linux (using davfs2) or the others OS.</p>
</li>
<li>
<p>The Bad : however, my conclusion is that this technology is not really reliable to build a cloud meeting my requirements :</p>
<ul>
<li>Time or rights are not preserved upon copy.</li>
<li>Mainly due to previous restriction, the synchronization (using rsync or unison for instance) is not reliable and even dangerous.</li>
<li>I observed sometimes (using davfs2) that some files existing on the server side are not visible from the client (even with a regular name).</li>
<li>Webdav requires a cache on the client and comes with write latencies, often of several seconds or tens of seconds.</li>
<li>Installation is often cumbersome, especially under Windows XP/Vista/Seven that comes with different bugs so we need to change the windows registry (I never ended to make it work under Seven).</li>
<li>Webdav has a bad reputation when it comes about security but "Secure" Webdav, ie Webdav +Basic/Digest authentication under HTTPS looks enough (I'm not a security expert though).</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2>Note about the hardware, a CubieBoard 1</h2>
<ul>
<li>
<p>Excellent lightweight device : a bit more expensive than a Raspberry
but more powerful (1Ghz ARM CPU), more memory (512MB) and a SATA3
adapter to avoid using a slower USB connector.</p>
</li>
<li>
<p>My hdparm stats :</p>
<p>Timing cached reads: 796 MB in 2.00 seconds = 398.06 MB/sec
Timing buffered disk reads: 326 MB in 3.00 seconds = 108.52 MB/sec</p>
</li>
<li>
<p>Note that a CubieBoard 2 has been recently made available, the main
evolution is a dual core ARM CPU. Looks good but my CubieBoard 1
looks still enough for me alone.</p>
</li>
<li>
<p>The measured power consumption including the transformer goes from
3W (100% idle) to 6W (100% CPU + extensive IO usage)</p>
</li>
<li>
<p>The (excellent) tutorial I followed to install Debian is <a href="http:%25%25//%25%25linux-sunxi.org/Cubieboard/Installing_on_NAND">here</a></p>
</li>
<li>
<p>The bad :</p>
<ul>
<li>I had a lot of IO failures due to lack of power of the 2.5' hard disk. I finally found a solution : in addition to the regular 5V/0.5A power jack cable, I had to plug another USB cable into the female mini USB port : using this double power supplies, the SATA connector works like a charm.</li>
<li>CPU is enough for a single person remote access (Apache, on-fly encryption, unison...) but not enough to compress tens GB of data when doing backups. I have to backup using a tar method, even gzip is far too slow and would take days (~1MB/sec). It's still OK because I have a very large volume of free disk.</li>
<li>I regularly backup the system (about 1GB) using a microSD card stored in a safe place far from the server.</li>
</ul>
</li>
</ul>
<h2>Note about EncFS</h2>
<p>EncFS is a filesystem encryption program. It map a "real" filesystem
with encrypted files to a userspace 'in memory' filesystem. It is very
simple to use, stores the files encrypted file by file, even the
directories and file names are encrypted. The encryption is very strong
using the paranoia mode ("Cipher: AES Key Size: 256 bits PBKDF2 with 3
second runtime, 160 bit salt"according the man page).</p>
<ul>
<li>If an attacker or a burglar physically stoles the server, he has to unplug the server thus to shutdown it. Without the password, the data is safety encrypted on the hard disk and is lost for the attacker.</li>
<li>Note that EncFS doesn't actually use your password to encrypt the files but actually uses a self-generated internal password itself encrypted using your password. It is cool because this way, you can change the filesystem password (EncFS provides some admin command for that), none file has actually to be encrypted again.</li>
<li>Another cool thing with EncFS is that fact that even root can't access the filesystem, only the user that mounted the filesystem into its userspace (www-data when used in an Apache context) is able to.</li>
<li>A last cool thing is that all the files are already encrypted for backup : one doesn't have to encrypt the files during the backup process (hopefully given the size of the data and my server CPU, it would be simply impossible in my case). The backup files can be stored on a regular filesystem as the data is already encrypted. Moreover, the per file EncFS encryption mechanism allows incremental backup (mandatory as well in my case).</li>
<li>I also use EncFS to store my local files on laptop so the data is never available in clear all over the process (encrypted on my laptop, encrypted during the transfer using a strong SSL encryption and finally encrypted on the server side)</li>
<li>The CPU overhead is minor. the EncFS process has some 60-80% CPU usage on the (fanless) server CPU during a short period of time when accessing files but I still get a lot of wait IO so the disk access is actually a greater speed limiter.</li>
<li>The only (minor) drawback is the fact that one have to provide a password to mount the filesystem (done only once when booting the server).</li>
</ul>
<h2>About Unison</h2>
<p>Unison is an excellent tool to synchronize two locations. It is simpler
and more powerful than rsync for that special purpose. I initially tried
to synchronize the local files on my laptop with the Webdav mount point
but it has been a disaster for the reasons I explained before.</p>
<ul>
<li>Unison can also work over SSH but require a unison on the server side as well. This way, I assume Unison detects changes from the server and send only a final digest over SSH, it is impressively fast.</li>
<li>I use cron or bash scripts with sleep loops for the synchronization scheduling.</li>
<li>I configure unison to ignore paths in order to synchronize partial part of some directories located on the cloud into different nodes. For instance, let's say that I work at home on project 'p1' and at work on project 'p2', I want to get :</li>
<li>On the cloud, all the projects : /mydata/myprojects/p1, /mydata/myprojects/p2</li>
<li>On my personal laptop : /home/me/p1 (only 'p1' files, none 'p2' file)</li>
<li>On my office laptop : /home/me/p2 (only 'p2' files, none 'p1' file)</li>
</ul>
<h2>The technical stack in use</h2>
<ul>
<li>
<p>Apache with SSL and Webdav modules</p>
<ul>
<li>The same Apache Virtual host for Webdav and HTTPS, the first is obviously Read-only, the second can be written or mounted.</li>
<li>I use a RSA 4096 bits certificate to make the communication safer.</li>
<li>The HTTPS virtual host is protected using a Digest Authentication password.</li>
<li>I use port 80 (for a HTTP tunnel) and port 443 (for Webdav and plain HTTPS) because HTTP proxy usually only allow them. Using an HTTP tunnel allows me to synchronize my directories even behind an HTTP proxy when required.</li>
</ul>
</li>
<li>
<p>Unison for file synchronization.</p>
</li>
<li>
<p>I use several well known security systems including a iptables firewall restringing every port but 80 and 443. Fail2ban is configured to ban attackers that failed to login into SSH or Apache services.</p>
</li>
<li>
<p>http-tunnel is a very simple http tunneling tool that work very well. I is available as a standard Debian package as well. I had a problem using it with unison though behind an HTTP proxy due to packets length. The solution for me has been to set the -c option to a high value :</p>
<pre><code>htc **-c 100M** -F 1058 mydomain.com:80 .
</code></pre>
</li>
<li>
<p>The cloud and laptop local data is stored encrypted using EncFS.</p>
</li>
<li>
<p>The server files are backed up using the excellent tool 'backup-manager'. EncFS makes the backup security free as I explained in the EncFS section. Naturally, the backups files have regularly to be saved into an external disk physically protected and located far away from the server in case of disaster or thief.</p>
</li>
</ul>
<h2>Final thoughts</h2>
<p>I finally met all my requirements :</p>
<ul>
<li>Very cheap (disk price : 0.08€/GB at this day + 5.50€ / year of electricity for an average consumption of 4W + 60€ for the CubieBoard =~ 22€/year for 1TB of storage over a 5 years amortization period)</li>
<li>Pretty safe solution. By security I mean mainly confidentiality, authentication and backup. All the data is stored at home, away from large Internet companies.</li>
<li>Large storage space (1 TB).</li>
<li>Very fast : synchronization usually lasts less that 2 min and has no significant effect on the client nor server CPU. It performs several orders of magnitude better than every solutions I tried before.</li>
</ul>
How I manage my passwords2014-08-16T00:00:00+00:00https://florat.net/how-i-manage-my-passwords/<p>With so many websites and system credentials we have to remember,
settling down to an acceptable password policy is challenging. After
years of trial and error, I'm approaching something I eventually find
convenient and safe enough.</p>
<hr>
<p><strong>UPDATE 2020</strong></p>
<p>For type-1 passwords according to my categorization, I now use <a href="https://www.ssi.gouv.fr/uploads/IMG/pdf/NP_MDP_NoteTech.pdf">this method</a> from ANSSI (French Security Agency) : create a memorable passphrase and use only the first letters. For instance : <code>When I go to work, I always stop at Bob's</code> becomes : <code>wig2w,IasaB's</code> .</p>
<hr>
<h3>What I learned</h3>
<ul>
<li>
<p>Don't use the same password for all credentials : if one is
cracked, attackers will go straight to the similar resources (like
Facebook after Twitter) and gain access to them. Using small
variants is not enough, especially of the pattern if obvious (like
<code>mypassword_facebook</code> and <code>mypassword_twitter</code> ) .</p>
</li>
<li>
<p>Change often your passwords (I have still some work to be done
here).</p>
</li>
<li>
<p>Use passphrases or long passwords because the most important thing for
a credential is the length, not the estimated complicity. Check out
this <a href="https://www.grc.com/haystack.htm">excellent website</a>, it
explains it ways better than I would.</p>
</li>
<li>
<p>Human are extremely predictable, never trust yourself when choosing
a password, only trust randomness and maths.</p>
</li>
<li>
<p>Never use a generated password from the Web, you never now if the website is safe or if the communication between you and this website is (even under HTTPS, the communication can be intercepted and store for further analysis by malicious governments for instance).</p>
</li>
<li>
<p>Don't trust password "strength" evaluators that are based upon
the kind of characters, their case, the special characters presence
and so on but doesn't deal with emerging patterns that would
dramatically reduce the entropy and makes the password trivial to
guess. For example, <code>aBcDeFgHiJ1234567</code> is evaluated as very strong
but would be broken down in minutes by any attacker.</p>
</li>
<li>
<p>Only rely on randomness from the real word (like using dices or
coins), not on pseudo random number generators (like <code>/dev/urandom</code>
under Gnu/Linux). However, I feel free to use random number
generators when available ( <code>/dev/random</code> under Gnu/Linux). OK, I
know it is less safe than using physical stuffs but I feel it's an
acceptable trade-of between security and convenience.</p>
</li>
<li>
<p>Don't let your browser to remember the most important passwords and
perform regular cleanups of every passwords you already stored into
it. However, I for one make exceptions for low to moderate
importance passwords GIVEN THAT 1) I NEVER leave my computer
unlocked, even for a few minutes 2) all my personal data is stored
on FDE or LUKS/dm-crypt encrypted volumes.</p>
</li>
<li>
<p>There are two types of passwords :</p>
<ul>
<li>
<p>[Type 1] The passwords you need to remember because you often need them (like login on your systems) or because you must remember them when you don't have your computer with you, when traveling for instance (Paypal, Online bank, webmail passwords etc.). You should create a strong yet memorable passphrase for each of them. The best method to achieve it is probably using the <a href="http://world.std.com/~reinhold/diceware.html">Diceware</a> method. If you aren't already familiar with it, I can't advice you enough to read it and its <a href="http://world.std.com/~reinhold/dicewarefaq.html">FAQ</a>.</p>
</li>
<li>
<p>[Type 2] The passwords you don't need to remember because you don't use them often. In this case, free your mind and store them using a wallet program like <a href="http://keepass.info/index.html">keepass</a> or an encrypted raw text file. Don't use proprietary program that could contain backdoors but only Free/Open Source softwares.</p>
</li>
</ul>
</li>
<li>
<p>Not all passwords have to be equally safe. The more a password is safe, the more it is difficult to remember and the longer it is to type, hence altering user experience. When dealing with 'stored' passwords, you should always use very long and complex passwords because there is no inconvenient to do so in this case. You can use a (local) password generator of very length random strings with many numbers, different letter cases and special characters because you don't have to remember them anyway but only to copy/paste them from the wallet (BTW, most of them come with a convenient feature of pushing temporary the passwords into the clipboard and can generate new passwords as well). The length and complexity of the passwords to remember, for their part, can be calibrated according different levels. For example : 4 diceware words for low/medium security level and 6 words and case/special characters variations for the most sensible credentials.</p>
</li>
<li>
<p>Use a personal <a href="http://en.wikipedia.org/wiki/Salt_%28cryptography%29">salt</a> (a salt is a string we add to a password to make sure that an attacker cannot use pre-computed rainbow tables and break your password in seconds). Most websites don't actually store your password but only a MD5/SHA-1 hash of your password along with a salt set on a per user basis. This is the current state of the art but this is not always the case and you can't expect all the websites you use to enforce this basic rule. Using your own salt is an additional precaution in the case where the website stores the passwords hashes without salt. Of course, it is useless if the website stores the password in clear.</p>
</li>
</ul>
<h3>The errors I made</h3>
<ul>
<li>
<p>I used online password generators. <a href="http://www.deadboltpasswordgenerator.com/">Some</a> are cool because
they map easy to remember passphrases to strong passwords. So.
what's the problem ? 1) You have to come back to their website
every time you need the password ; 2) same as before, you can't
trust the website or the communication anyway; 3) What if the online
service shuts down ? answer : you loose all your passwords (you
don't even know the algorithm they use to map a passphrase to a
strong password so you can't rewrite it by yourself to get back
your passwords from the passphrases you still remember).</p>
</li>
<li>
<p>I tried various methods to remember my passwords. Some are based upon a base password on which we apply a transformation (like a->@,
i ->! and so on) and that we specialize according to the website
(like <code>MyP@wd-f@cebooK</code> and <code>MyP@wd-Tw!tteR</code> ). What's wrong with
that ? 1) The special characters substitution is often hard-coded
into the attacker dictionary and has nearly zero advantage in
comparison with the initial character; 2) Imagine that in my case an
attacker cracks my Facebook password, do you think it will be
difficult for him to find the Twitter one once he knows my pattern ?</p>
</li>
</ul>
<h3>The final solution I set up</h3>
<p><em>Disclaimer : while most of the tools or methods exposed here are
proved, the adaptations of my own may reveal wrong, I don't claim to be
a security expert.</em></p>
<h4>For type 1 passwords</h4>
<p>I use the raw Diceware method or a small free software password
generator running locally on my desktop without any external dependency
and made of only few hundred of lines of code (that I checked). I also
hacked the program to use /dev/random instead of /dev/urandom. The
program used the diceware 8k dictionary. For medium security level, I
use a Diceware three words scheme + a salt. For high security passwords,
I use a five Diceware words scheme (that I'll call the 'base') + a
salt + a random number/special character pattern. To increase the
passphrase entropy, I use this following personal method*. The basic
idea is to use the passphrase base itself to add entropy without adding
things to remember like positions of special characters :</p>
<ul>
<li>The salt is made of the concatenation of each first letter of the Diceware words and a '+' character.</li>
<li>The five Diceware words are expressed in lower case without separator (never use space between words because of the noise made by the space bar, you would give a significant hit to a spy).</li>
<li>A special character + three number (like '''587'') I'll have to remember in addition to the base passphrase. The location of the pattern is given using this basic algorithm : the word number is given by the alphabetical order of the base password, then the location of the pattern into the matching word is given by the alphabetic order of word letters itself (I don't detail the boundary limits cases here).</li>
<li>Example of resulting password for this Diceware pass phrase : ''dec scan labile deify shafer'' becomes : ''dslrs+decscanlabiledeif'587yshafer'' (d of 'dec' = 4 so the pattern is included in the 4th word, deify and 'd' in 'deify' gives 4th position in 'deify' word).</li>
</ul>
<p>(*) The Kerckkoffs security principle states that knowing the security
tools or methods in use doesn't provide any significant advantage to
the attacker, I hope this is still the case here.</p>
<h4>For type 2 passwords</h4>
<p>I don't like much wallet programs because I find them too 'formal'
and too cumbersome to add new entries. I finally use a HTML/Javascript
small free software page I run locally. My passwords are AES-256
encrypted on a file I open using any text editor. Then I paste the
encrypted text into this web page, type the master password and the
clear text with passwords is then displayed in a text area, ready for
copy/paste or CTRL-F searches. I read the Javascript code to check for
backdoors and hacked it slighly, adding a timer to clear the password
and the clear text area after a short delay so the passwords information
is hidden automatically even if I forget to close the browser tab.</p>
Les bugs mystiques2014-08-16T00:00:00+00:00https://florat.net/les-bugs-mystiques/<p>L'immense majorité des bugs que nous subissons tous les jours trouve une
explication rationnelle assez facilement. Une autre catégorie,
heureusement extrêmement rare est celle des bugs dits "mystiques".
Prenez les logs de tout système complexe et fortement chargé tels un
serveur d'application ou un moniteur transactionnel : je vous prédis que
vous y trouvez toujours sur une période suffisante des messages d'erreur
étranges et non reproductibles...</p>
<h1>Définition</h1>
<p>Je nomme « bug mystique » un bug non reproductible, c'est à dire se
produisant aléatoirement et pour des raisons inconnues.</p>
<h1>Etymologie</h1>
<p>Ce terme, issu de l'argot informatique et identique dans plusieurs
langues ("Mystical bug" en Anglais) traduit parfaitement le coté
ésotérique de ces étrangetés.</p>
<h1>Le paradoxe</h1>
<p>Pourtant, quoi de plus antinomique que d'un coté l'informatique et la
programmation, issues des mathématiques (un programme étant une formule
mathématique) et de l'autre le monde opaque de l'Incertain, du Hasard,
du Destin ? Il est pourtant difficile en informatique de générer le
hasard à volonté : les algorithmes des générateurs pseudo aléatoires
sont complexes et utilisent de nombreuses données issues de
l'environnement du calculateur comme l'heure, le mouvement de la souris
pour un résultat souvent médiocre (des suites identiques apparaissent
souvent). A l'opposé, les bugs mystiques semble apparaître aléatoirement
car c'est leur nature : ils sont non reproductibles et peuvent subvenir
n'importe quand et souvent à partir de situations d'origine a priori
identiques.</p>
<h1>Potentialité des bugs mystiques</h1>
<p>Un informatien a dit un jour que certains bugs pouvaient se produire
statistiquement une fois par siècle, c'est à dire sur une période au
moins 5 à 10 fois supérieures à la durée de vie du programme lui-même.
Je pense que c'est exact. Certains bugs mystiques peuvent ne jamais
apparaître et rester tapis au fin fond d'obscurs tests ou boucles dont
les conditions sont si improbables qu'il ne se produira jamais
effectivement bien qu'il existe potentiellement.</p>
<h1>Les causes des bugs mystiques</h1>
<p>Un bug mystique peut se produire en autres :</p>
<ul>
<li>
<p>A cause des données à traiter : Dans le cas de l'exécution d'une
primitive avec des arguments extrêmement particuliers par exemple.</p>
</li>
<li>
<p>A cause d'un problème physique comme le changement simultané de
plusieurs bits en mémoire vive, une erreur de lecture d'un support
physique, une micro-coupure électrique, un bug matériel du
processeur ou d'autres composants électroniques...</p>
</li>
<li>
<p>A cause du code lui-même : bug du compilateur ou d'une machine
virtuelle, bug dans le langage, utilisation inappropriée de
fonctions spéciales... J'ai déjà vu des commentaires dans des
sources Pro*C du type "Ne pas supprimer ce commentaire sinon le
programme plante" qui ne mentaient pas, des caractères spéciaux ou
écrits en hexadécimal produisant des effets imprévus à la
compilation ou à l'exécution...</p>
</li>
<li>
<p>A cause de la gestion de la mémoire : écrasement de segments mémoire
de données par du code ou le contraire. Ce genre de bug sont souvent
à la base des "exploits" utilisés par les craker pour casser
sécurisés.</p>
</li>
<li>
<p>A cause de problèmes passifs (ne produisant pas de bugs) de
plusieurs modules ou API et qui, utilisés ensemble, se combinent
pour faire émerger un nouveau bug actif.</p>
</li>
<li>
<p>A cause du multi-tâches : à mon avis, la source principale de bugs
mystiques dans les langages contemporains comme le Java. Malgré les
outils de verrous proposés par ces langages, il est souvent
difficile d'éviter totalement les situations imprévues et d'accès
concurrents à des ressources partagées en mémoire.</p>
</li>
<li>
<p>A cause de la gestion des transactions : Il faut utiliser
correctement les outils de gestion de concurrence (ACID) pour éviter
les événements imprévus lors de l'accès à une ressource comme une
base de donnée, un MOM, un système externe etc... Ce type de
composant peut avoir un comportement différent selon le vendeur (un
accès concurrent dans une base de donnée peut lever une erreur ou se
mettre en attente du verrou par exemple).</p>
</li>
<li>
<p>A cause des problèmes de synchronisation inter transactionnels :
Imaginez qu'un utilisateur A demande une fiche client dans une
transaction propre. Il lit la fiche plusieurs minutes puis décide de
modifier la donnée X. Entre temps, un utilisateur B a modifié la
donnée Y dans une transaction de mise à jour terminée. Si la
transaction de mise à jour envoie toutes les données en une seule
fois (mode bulk), la mise à jour de l'utilisateur B sera écrasée par
celles de l'utilisateur A et pourtant, chaque transaction se passe
correctement et aucun accès concurrent n'est constaté.</p>
</li>
<li>
<p>A cause des "dead locks" : un dead lock est un blocage définitif
de deux taches se produisant dans le cas très particuliers d'accès
concurrent à deux ressources distinctes dans un ordre précis (le
thread 1 accède à la ressource A puis B et le thread 2 accède à la
ressource B puis A en même temps que 1).</p>
</li>
<li>
<p>A cause de nombreuses autres raisons comme des effets de bords rares
etc.</p>
</li>
</ul>
<h1>Conclusion : le jardin secret du programme</h1>
<p>Le bug mystique donne une nouvelle dimension aux systèmes d'information
et semble faire émerger une sorte de chaos, une conscience qui dépasse
le cadre créé par l'humain. Les bugs mystiques se cachent dans le jardin
secret du programme, hors de portée de la pensée et de la compréhension
des développeurs. Ils agacent, non seulement à cause du bug lui-même
mais surtout à cause de l'impression que le programme cache quelque
chose, qu'il possède le pouvoir mystérieux de sortir de son chapeau l'un
de ces bugs à sa guise.</p>
Undocumented Oracle PreparedStatement optimization2014-08-15T00:00:00+00:00https://florat.net/undocumented-oracle-preparedstatement-optimization/<p>We just get a 20% response time gain on a 600+ lines query under Oracle.
Our DBA noticed that queries were faster when launched from SQLDeveloper
than from our JEE application using the JDBC Oracler 11g driver. We
looked at the queries as they actually arrived to the Oracle engine and
they where under the form :
<code>SELECT... WHERE col1 like ':myvar1' OR col2 LIKE ':myvar2' AND col3 IN (:item1,:myvar2,...)</code>
and not
<code>'SELECT... WHERE col1 LIKE ':1' OR col2 LIKE ':2' AND col3 IN (:3,:4,...)</code>
like usual when using PreparedStatement the regular way.</p>
<p>Indeed, every PreparedStatement documentation I'm aware of, beginning
with <a href="http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html">the one from
Sun</a>
states that we have to use <code>'?</code>' to represent bind variables in
queries. These <code>'?</code>' are replaced by <code>':1</code>', <code>':2</code>', <code>'3</code>' ... by
the JDBC driver. So the database has no way to now in our case that :2
and :4 have the same value. This information is lost.</p>
<p>We discovered that we can use PrepareStatement by providing queries with
named bind variables instead of <code>'?</code>'. Of course, we still have to set
the right value using the <code>setXXX(int position,value)</code> setters for every
bind variable occurrence in the query. Then, queries arrive to Oracle
like when using SQLDeveloper, with named bind variables.</p>
<h5>OK but what's the deal with all this ?</h5>
<p>I'm not sure but I think that this optimization may allow Oracle
optimizer to be cleverer, especially for queries with redundant parts.
It is especially good for queries with duplicated sub SELECT with IN
condition containing all the same list of items. Maybe Oracle create
on-the fly WITH clauses or similar optimizations in this case ?</p>
<p>Note that this optimization may only work with Oracle and is probably
only useful for very large or redundant queries. I don't recommend it
in most cases. AFAIK, neither Hibernate nor Spring-JDBC implements this
optimization.</p>
How to get bind variables values from Oracle2014-05-11T00:00:00+00:00https://florat.net/how-to-get-bind-variables-values-from-oracle/<p>If you already used JDBC prepared statement, you know what are bind
variables : the '?' in the query, like in :
<code>SELECT col1,col2 from t_table where col1 in (?,?,?) AND col2 = ?</code> For
the record, all compiled queries with the same number of '?' are
cached by Oracle, hence (most of the time) faster to execute. But how to
debug passed values ? This is often valuable like yesterday where one of
our services tried to insert value too large for a column (a 4 digits
integer into a <code>NUMBER(5,2)</code>).</p>
<p>There is several ways to achieve it, one is using a 'wrapper' JDBC
driver (like log4jdbc) that audit and log the values but it's a bit
intrusive.</p>
<p>A very simple non-intrusive way for a specific need is to query the
<code>v$sql</code> table, the Oracle internal log. A sample query is given bellow
(<a href="https://stackoverflow.com/questions/14217461/how-to-find-parameters-in-oracle-query-received-from-vsql/14217618#14217618?newreg=d1017924cf7748119a11379f0b9e65ff">source Stack
Overflow</a>)
:</p>
<pre><code>select s.sql_id,
bc.position,
bc.value_string,
s.last_load_time,
bc.last_captured
from v$sql s
left join v$sql_bind_capture bc
on bc.sql_id = s.sql_id
and bc.child_number = s.child_number
where s.sql_text like 'delete from tableA where fk%' -- or any other method to identify the SQL statement
order by s.sql_id, bc.position;
</code></pre>
<p>It works like a charm !</p>
Move to Github done smoothly2014-02-01T00:00:00+00:00https://florat.net/move-to-github-done-smoothly/<p>The Jajuk issue tracker and the Git repository are now moved to GitHub
(see previous article for the context).</p>
<h3>Repository move</h3>
<p>Obviously and by nature, the Git repository move has been very simple. I
just had to drop my previous origin (pointing to the gitorious project
url), to add the new Github origin and to push all my branches. The push
of the master branch toke around 30 mins and the others branches
(develop, hotfix) almost no time at all. Note that the <code>-u</code> option used
in the push command recreates the upstream tracking references.</p>
<pre><code>git remote del origin
git remote add origin git@github.com:jajukteam/jajuk.git
git push -u origin master
</code></pre>
<p>The only problem occurred when dropping our Gitorious repository (error
500 -> timeout?)</p>
<h3>Issue tracker move</h3>
<p>I tried several Trac to Github migration tools, most of them didn't
work and finaly settled down with
<a href="https://github.com/trustmaster/trac2github">trac2github</a>. It is written
in PHP, reads the database (supports mysql, postgres and sqlite) and
call the GitHub REST API V3 to create the tickets. It creates the
milestones, labels, tickets and comments with good defaults. It had some
bugs when working with a postgres database and I has to patch it (two of
my push request has been integrated). I also pushed a patch to obfuscate
emails from comments.</p>
<p>I also figured out another problem (not linked with the migration tool)
: we used the DeleteTicket Trac plugin to drop spam tickets but GitHub
issues ids have to be continuous. Origin and destination issues ids are
hence now shifted, this is a problem when the code comments have
references to a ticket number but we had no solutions for this problem
AFAIK.</p>
<p>Have a look at the brand new issue tracker ! :
<a href="https://github.com/jajuk-team/jajuk/issues">https://github.com/jajuk-team/jajuk/issues</a></p>
BitBucket vs Github issue tracker choice for Jajuk2014-01-20T00:00:00+00:00https://florat.net/bitbucket-vs-github-issue-tracker-choice-for-jajuk/<p>We are currently moving our Jajuk Trac <a href="http://integration.jajuk.info/">issue
tracker</a> to a better place, mainly for
spam reasons. A developer suggested BitBucket, others (me included)
GitHub which I already use. I cloned our secondary project QDWizard on a
private BitBucket repository to make an opinion. I have to say BitBucker
is really good too.</p>
<p>According to me, both systems deliver the most important features :</p>
<ul>
<li>Simple to import from Trac.</li>
<li>Export facilities to make change possible in the future.</li>
<li>Clean and simple GUI.</li>
<li>Clean roadmap/version support.</li>
<li>Assignation facilities.</li>
</ul>
<p>But:</p>
<ul>
<li>Github has much more users (around 4M compared to 1M for BitBucket).
More developers already have accounts and are used to it.</li>
<li>GitHub GUI is a bit faster.</li>
<li>GitHub is more "open source" minded, I feel BitBucket more
enterprise oriented (private repositories).</li>
<li>BitBucket is free only until 5 developers.</li>
</ul>
<p>Specifically about issues management : the issue manager in Bitbucket is
not actually Jira but a lightweight tracker. It doesn't come
(hopefully) with the full workflow support. Like most tracker, each
ticket has a type (a "kind" : bug, enhancement, proposal, task) , a
priority (trivial,..., blocker) and a status ("workflow" : "on
hold", "resolved", "duplicate", "invalid", "wontfix" and
"closed"). Note that these states can't be changed nor augmented
(many users asked for adding "tested" but it has never been added).
It's like Trac without the possibility to customize new types and new
status. Some Jajuk Trac types are not supported : "known issue",
"Limitation", "patch", "support request", "to_be_reproduced"
(and we map our "discussion" to BitBucket "Proposal"). Some status
are missing too : "worksforme", "not_enough_information". I suppose
a migration would have force us to map several status and several types
to the same Bitbucket kind/workflow.</p>
<p>From its side, Github comes with (according to me) a very elegant
solution : there are no tickets priorities, types or states but only
"labels" like : "important", "bug", "wont fix" ,
<whatever>... OK, it may be more laxist but on the other side :</p>
<pre><code> - it allows to add any labels to qualify a ticket against any aspect you may think about ;
- it doesn't force to use potentially useless fields like priority.
</code></pre>
<p>I suppose the migration scripts will be able to simply create any new
labels to reflect our existing status and status (yet to be proven). We
still have to run the migration script, I'll test this probably this
week end.</p>
Keynux Epure S4 laptop review2012-10-28T00:00:00+00:00https://florat.net/keynux-epure-s4-laptop-review/<p><img src="https://florat.net/assets/images/blog-tech/keynux_1.jpg" alt="keynux_1.jpg"></p>
<h1>Main thoughts</h1>
<p>I bought a Keynux Epure S4 three mouths ago now and it is time to turn
it out. At the risk of spoiling, I can already tell you that this laptop
rocks and is a good deal. Why did I bought a Keynux in the first place ?
simply because it was (AFAIK in December 2011) the only French laptop
assembler dealing with my three main criteria :</p>
<ul>
<li>Running as good as possible under Linux.</li>
<li>No <a href="http://non.aux.racketiciels.info/">Microsoft tax</a>.</li>
<li>Custom and fine grained hardware choice.</li>
</ul>
<p>I use this laptop mainly for development, to run virtual machines (along
with a "regular" browsing / office use of course). I (almost) don't
play games or have others high GPU usages. My main strategy was to
select the less expensive Keynux laptop and then to move upmarket the
most important components for me (like hard drive, CPU and memory). It
cost me around 1400 € (VTA and transport included).</p>
<h1>Specifications</h1>
<ul>
<li>Model
<a href="http://keynux.com/default_zone/fr/html/Prod_Notebook_EpureS4_Details.php">website</a>
and
<a href="http://keynux.com/default_zone/fr/html/Prod_Notebook_EpureS4_Spec.php">specifications</a></li>
<li>My custom Epure is basically (see complete specifications bellow for
more details) :
<ul>
<li>a Clevo W251HSQ laptop chassis.</li>
<li>an i7 dual core CPU with hyperthreading, the OS sees 2x2= 4 CPU
(note that most others i7 are quad-cores : the OS seems 8 CPU
but a dual core is fine for development).</li>
<li>a built-in Intel HD Graphics 3000 GPU.</li>
<li>a 500GB XT SSD/HS Seagate hybrid.</li>
<li>8 GB SO-DIMM RAM DDR-3 / 1333 MHz (2 x 4 Go) RAM.</li>
</ul>
</li>
</ul>
<p>I hesitated between a pure SSD and hybrid hard disk and finally bought
the hybrid to get more storage at the best price and because my usage
implies a lot of writes during VM executions. I'm very happy with this
solution and the boot takes about 20 secs.</p>
<h1>Linux configuration</h1>
<p><img src="https://florat.net/assets/images/blog-tech/keynux_2.png" alt="keynux_2.png"></p>
<p>I use a <a href="http://xubuntu.org/">xubuntu</a> 11.10 desktop. Xfce is a lightweight
desktop manager allowing to boot faster and to save memory, power and
CPU at usage (a screenshot of my laptop on your right).</p>
<h2>Kernel boot options</h2>
<p>(Grub configuration under /etc/default/grub under Ubuntu) :
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi_osi= i915.modeset=1
add_efi_memmap i915.i915_enable_rc6=7 i915.i915_enable_fbc=1
i915.lvds_downclock=1"</p>
<ul>
<li>acpi_osi= makes the lightness Fn keys to work (don't ask me)...</li>
<li>acpi_osi= i915.modeset=1 add_efi_memmap i915.i915_enable_rc6=7
i915.i915_enable_fbc=1 i915.lvds_downclock=1 enables the GPU eco
mode (thanks Jean-Baptiste) : saves me around 50% of power
consumption and 10 degrees (and hence makes the fan mush less noisy
in the same time). Power measures on batteries (using powertop) :
dropped from 37 watts to 22 watts.</li>
<li>Don't use the pcie_aspm=force option to save more power (see
<a href="https://lkml.org/lkml/2011/11/10/467">here</a>), some components
(probably the Ethernet card) doesn't support ASPM and I got random
freezes when plugging the Ethernet cable for instance).</li>
</ul>
<h2>Xubuntu configuration</h2>
<ul>
<li>The sound was always muted at startup. To fix that, store the
current volume state using alsactl :</li>
</ul>
<p>sudo alsactl store</p>
<ul>
<li>Out of the box, my LG LED projector didn't display any image
neither in VGA nor in HDNI mode. After i915 Xorg drivers upgrades,
the HNDI works. You can install them using <a href="https://launchpad.net/~xorg-edgers/+archive/ppa">this PPA
repository</a>.</li>
</ul>
<h2>Issues</h2>
<ul>
<li>VGA display don't work with my LG LED projector (see previous item)
but it does with my HP external screen so it must be specific to the
projector (it worked with my previous Lenovo however).</li>
</ul>
<h1>The good</h1>
<ul>
<li>Very smart and plain chassis (I would suggest a M505 black mouse
along with it for a perfect look).</li>
<li>Gorgeous 1600x900 screen. Very good color display.</li>
<li>Price : without the Microsoft tax, you get about 150 € and this
laptop should be about 200 € less expensive than a comparable
Lenovo.</li>
<li>Impressively fast for development usage.</li>
<li>Standard charger standards (I even managed to recycle an old
charger).</li>
<li>The keyword typing feeling is very pleasant.</li>
<li>Light packaging</li>
<li>Reactive and professional support.</li>
</ul>
<p><img src="https://florat.net/assets/images/blog-tech/keynux_3.jpg" alt="keynux_3.jpg"></p>
<h1>The bad</h1>
<ul>
<li>No embedded light (to light keyboards up in the dark).</li>
<li>No physical wifi ON/OFF nor volume buttons.</li>
<li>Only three USB 2 ports.</li>
<li>The Ethernet plug is inverted (pin points toward the ground) and has
no activity LED.</li>
</ul>
<h1>Small troubles</h1>
<ul>
<li>The screen can be opened only by around 100 degrees from the
keyboard. Higher opening can be useful when using the laptop on some
ergonomic supports.</li>
<li>The power plug is not very well positioned (on the left, I would
prefer on the right) and feels fragile.</li>
<li>By default, the French 220V plug bents at an angle. This makes
unplugging very difficult. I had to change for a straight plug
cable.</li>
<li>The LED on the charger is annoying when used in a dark room.</li>
<li>The BIOS cannot be parametrized (beside time and few others things).
However, it's a way to make the laptop safer.</li>
<li>The "End" and "Begin" keys are mixed with the numeric keys and
makes their use confusing, I would prefer independent keys.</li>
<li>No "pseudo wheel" on the touch pad.</li>
</ul>
Conférence RMLL - Un retour des tranchées de l'Open Source (Jajuk)2009-06-10T00:00:00+00:00https://florat.net/conference-rmll-un-retour-des-tranchees-de-l'open-source-(jajuk)/<p>Conférence à retrouver <a href="https://public.florat.net/rmll2009-florat-developpement-open-source-jajuk.ogv">ici</a>.</p>
Conférence RMLL - Meilleurs projets en SSII avec l'Open Source2009-06-09T00:00:00+00:00https://florat.net/conference-rmll-meilleurs-projets-en-ssii-avec-l'open-source/<p>Video à retrouver <a href="https://public.florat.net/rmll2009-le-gac-florat-open-source-meilleurs-projets.ogv">ici</a>.</p>
Conférence Solutions Linux - L’approche orientée modèles DSM2008-01-29T00:00:00+00:00https://florat.net/conference-solutions-linux-l'approche-orientee-modeles-dsm/<p>Slides à retrouver <a href="https://public.florat.net/SolutionLinux2008-DSM-BFlorat-1.2.pdf">ici</a>.</p>
Linux on a VIA ME6000 and external hard disk real howto2005-01-26T00:00:00+00:00https://florat.net/linux-on-a-via-me6000-and-external-hard-disk-real-howto/<p>My goal was pretty simple: install a Linux distribution on a VIA ME6000 Mini-ITX PC without internal disk to be able to remove the case pan (ME6000 card has no pan) in order to get a 0 db Linux Box. Actually, it toke me near than two months to achieve it despite the little help from various howtos (knoppix howto on VIA for example) and forums. Some of the problems I get came from the CPU and some others from the fact I booted from an external usb disk.</p>
<h2>Distribution choice</h2>
<p>I've choosen Mandrake 10.1 that work perfectly on my box despite the fact I prefered Suse. I tried:</p>
<ul>
<li>
<p>Suse 9.2: simple, it <em>can't</em> work (since suse 8.2 apparently) because it uses a i586 cmov instruction that is not supported by the Samuel 2 VIA CPU. It freezes during install.</p>
</li>
<li>
<p>Debian Sarge : Boot but installer (text mode) is unreadable, it must have to deal with the video card I guess... I gave up.</p>
</li>
<li>
<p>Knoppix 3.7 : works perfectly as Live CD. Awesome. Nevertheless, when I installed it on my disk (sda3), it booted the kernel (I never figured out how it could be possible, read next chapter) but when mounting devices, I got a kernel panic due to a "devfs type not found" problem with kernel 2.4 or 2.6. This problem appeared with knoppix 3.5 apparently and we got none support from knoppix forum. A friend of mine told me that we have to install devfs package to solve this but I have no time to try again, tell me if it works.</p>
</li>
<li>
<p>DSL (Damn Small Linux) : I managed to install it on a USB pendrive with a lot of pain (read carefully partitioning howto from DSL howtos). The USB pendrive partition must be FAT16 type, have the bootable flag, have a number of head/track=32 and number of cylinders must be less or equal to 1024. However, I gave up to use it : it looks to be a nice dist but is too light for my daily needs.</p>
</li>
<li>
<p>Mandrake 10.1 : works and provides all the functionalities you can expect from this kind of dist. I kept it.</p>
</li>
</ul>
<h2>How to boot from an external USB hard disk real howto</h2>
<p>First of all, current bios makes very hard to boot from USB hard drives. It is near impossible to make it work, especially if you won't re-partitionate your disk. I tried about one month, reading tons of forums threads, howto, dist docs... I gave up to boot from USB hard disk but I managed to create a Boot CD. It is very simple indeed under Mandrake (when you know the right command):</p>
<ul>
<li>
<p>Install your Mandrake on the disk (/dev/sda1 or any other partition, use use /dev/sda3).</p>
</li>
<li>
<p>Insert you MDK disk 1, type F1.</p>
</li>
<li>
<p>Enter in rescue mode (type "rescue").</p>
</li>
<li>
<p>Select "Go to console".</p>
</li>
<li>
<p>Chroot to your disk : mkdir /sda1; mount /dev/sda1 /sda1; chroot /sda1</p>
</li>
<li>
<p>Launch <code>mkrescue --iso</code> to create a proper boot image matching current kernel, root...</p>
</li>
<li>
<p>Burn this image (rescue.iso file) with cdrecord under Linux (cdrecord --scanbus; cdrecord -dev=<your device like 1,0,0> -speed=1 rescue.iso) or any Burning utility from Windows.</p>
</li>
<li>
<p>Boot from CD (change Bios settings if needed), choose default option (Linux), it should boot your disk.</p>
</li>
</ul>
Asus L8400K overview2003-09-26T00:00:00+00:00https://florat.net/asus-l8400k-overview/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<body>
<h1 style="margin-bottom: 0,5cm;">
<font size="4">Description</font>
</h1>
<p style="margin-bottom: 0,5cm;">
<font size="2">
<i>General</i>
</font>
</p>
<table width="100%" border="1" bordercolor="#000000" cellpadding="4" cellspacing="0">
<col width="26*">
<col width="26*">
<col width="17*">
<col width="23*">
<col width="36*">
<col width="26*">
<col width="19*">
<col width="26*">
<col width="28*">
<col width="29*">
<thead>
<tr valign="top">
<th width="10%">
<p>
<font size="2">CPU</font>
</p>
</th>
<th width="10%">
<p>
<font size="2">Mem</font>
</p>
</th>
<th width="7%">
<p>
<font size="2">HD</font>
</p>
</th>
<th width="9%">
<p>
<font size="2">Screen</font>
</p>
</th>
<th width="14%">
<p>
<font size="2">Video</font>
</p>
</th>
<th width="10%">
<p>
<font size="2">Lectors</font>
</p>
</th>
<th width="8%">
<p>
<font size="2">Price</font>
</p>
</th>
<th width="10%">
<p>
<font size="2">Autonomy/weigth</font>
</p>
</th>
<th width="11%">
<p>
<font size="2">Sound</font>
</p>
</th>
<th width="11%">
<p>
<font size="2">Network</font>
</p>
</th>
</tr>
</thead>
<tbody>
<tr valign="top">
<td width="10%">
<p>
<font size="2">PIII850</font>
</p>
</td>
<td width="10%">
<p>
<font size="2">128Mo</font>
</p>
</td>
<td width="7%">
<p>
<font size="2">20 Go</font>
</p>
</td>
<td width="9%">
<p>
<font size="2">14,1TFT</font>
</p>
</td>
<td width="14%">
<p>
<font size="2">S3 Savage MX/MV</font>
</p>
</td>
<td width="10%">
<p style="margin-bottom: 0,5cm;">
<font size="2">DVD 8X</font>
</p>
<p style="margin-bottom: 0,5cm;">
<font size="2">floppy</font>
</p>
<p><br>
</p>
</td>
<td width="8%">
<p>
<font size="2">About $2000</font>
</p>
</td>
<td width="10%">
<p style="margin-bottom: 0,5cm;">
<font size="2">2-4 h</font>
</p>
<p>
<font size="2">2.9 kg</font>
</p>
</td>
<td width="11%">
<p>
<font size="2">ESS Allegro 1988-1</font>
</p>
</td>
<td width="11%">
<p style="margin-bottom: 0,5cm;">
<font size="2">Ethernet: </font>
</p>
<p style="margin-bottom: 0,5cm;">
<font size="2">Realtek 8139</font>
</p>
<p>
<font size="2">Modem : ESS winmodem</font>
</p>
</td>
</tr>
</tbody>
</table>
<p style="margin-bottom: 0,5cm;">
<font size="2">
<i>Connectors</i>
</font>
</p>
<ul>
<li>
<p>
<font size="2">2 PCMCIA port</font>
</p>
</li>
<li>
<p>
<font size="2">1 PS2 ( you can use a double ps2 plug
to use keyboard & mouse at the same time )</font>
</p>
</li>
<li>
<p>
<font size="2">1 Infra Red port</font>
</p>
</li>
<li>
<p>
<font size="2">1 TV out ( S-video )</font>
</p>
</li>
<li>
<p>
<font size="2">Audio: 1 out, 1 in, 1 jack </font>
</p>
</li>
<li>
<p>
<font size="2">2 USB</font>
</p>
</li>
<li>
<p>
<font size="2">1 RJ45 for ethernet and modem</font>
</p>
</li>
<li>
<p>
<font size="2">1 serial port ( little one) </font>
</p>
</li>
<li>
<p>
<font size="2">1parallel port</font>
</p>
</li>
<li>
<p>
<font size="2">1 kensington hole</font>
</p>
</li>
<li>
<p>
<font size="2">1 VGA out</font>
</p>
</li>
<li>
<p>
<font size="2">2 built-in loudspeakers</font>
</p>
</li>
<li>
<p style="margin-bottom: 0,5cm;">
<font size="2">1 microphone</font>
</p>
</li>
</ul>
<p style="margin-bottom: 0,5cm;"><br>
<br>
</p>
<p style="margin-bottom: 0,5cm;">
<font size="2">Note: I didn't get any
portbar connector in spite of the advert description.</font>
</p>
<p style="margin-bottom: 0,5cm;">
<a href="https://florat.net/asus-l8400k-overview/dmesg">
<u>
<font size="2">dmesg</font>
</u>
</a>
</p>
<p style="margin-bottom: 0,5cm;">
<a href="https://florat.net/asus-l8400k-overview/screenshot1.jpg">
<u>
<font size="2">screenshot
1</font>
</u>
</a>
</p>
<h1 style="margin-bottom: 0,5cm;">
<font size="3">General feelings</font>
</h1>
<p style="margin-bottom: 0,5cm;">
<font size="2">Excellent product to be
used under linux at work and at office. Mine was sold with Microsoft
Windows Millenium. After having resized partition ( with partition
magics ), I installed Suse 7.2 and everything was OK.<br>
</font>
</p>
<p style="margin-bottom: 0,5cm;">
<font size="2">
<span style="color: rgb(204, 0, 0);">Update : I had to change the mother
card ( 300€, by Asus France support ) after 2 years of good
services. It didn't boot any more.</span><br>
</font>
</p>
<p style="margin-bottom: 0,5cm;">
<font size="2">Summary for Suse 7.2
and
Mandrake 8 ( kernel 2.4.4/ XFree 4.0.3/KDE 2.1.1 ) :</font>
</p>
<table width="100%" border="1" bordercolor="#000000" cellpadding="4" cellspacing="0">
<col width="81*">
<col width="175*">
<thead>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">Video</font>
</p>
</td>
<td width="68%">
<p>
<b>yes</b>
</p>
</td>
</tr>
</thead>
<tbody>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">Sound </font>
</p>
</td>
<td width="68%">
<p>
<b>yes</b>
</p>
</td>
</tr>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">DVD</font>
</p>
</td>
<td width="68%">
<p>
<font size="2">CD-ROM reading: <b>yes</b>
</font><br>
<font size="2">Data DVD reading: <b>yes</b>
</font><br>
<font size="2">Video DVD reading: <b>yes</b> (see ogle :
</font>
<a href="http://www.dtek.chalmers.se/groups/dvd/downloads.html">
<font size="2">http://www.dtek.chalmers.se/groups/dvd/downloads.html</font>
</a>
<font size="2"> )</font>
</p>
</td>
</tr>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">Mouse</font>
</p>
</td>
<td width="68%">
<p>
<b>yes</b> ( see XF86config )</p>
</td>
</tr>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">IR</font>
</p>
</td>
<td width="68%">
<p>
<font size="2">
<b>? </b>(must work )</font>
</p>
</td>
</tr>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">APM</font>
</p>
</td>
<td width="68%">
<p>
<b>yes</b> , I use KDE module to check battery level</p>
</td>
</tr>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">Ethernet</font>
</p>
</td>
<td width="68%">
<p>
<b>yes</b>
</p>
</td>
</tr>
<tr valign="top">
<td width="32%">
<p align="center">
<font size="2">Modem</font>
</p>
</td>
<td width="68%">
<p>
<font size="2">
<b>no</b> ( winmodem ) </font><br>
<font size="2">Check <a href="http://www.linmodems.org/">http://www.linmodems.org
</a> but don't expect it to work before a long time.</font>
</p>
</td>
</tr>
</tbody>
</table>
<p style="margin-bottom: 0,5cm;"><br>
<br>
</p>
<p style="margin-bottom: 0,5cm;">
<font size="3">To sum up, everything
was perfect except the modem. I will use one old one on serial port
or I will buy a cheap one for PCMCIA.<br>
Every hardware part have
been detected without any additional configuration ( but mouse under
Suse ) and I got a running and usable system in less than half an
hour. I advise you to avoid Mandrake 8.0 with this lapstop because I
had some BIOS clocks problems with this distribution.</font>
</p>
<h1 style="margin-bottom: 0,5cm;">Useful information</h1>
<ul>
<li>
<p>Star Office 5.2 can freeze your notebook. If you use it, put
that line in your profile:</p>
</li>
</ul>
<p> <i>export
SAL_DO_NOT_USE_INVERT50=true</i>
</p>
<ul>
<li>
<p style="font-style: normal;">Change BIOS settings: OS= others.
Note that suspend to RAM works perfectly.</p>
</li>
</ul>
<p style="font-style: normal;"><br>
</p>
<ul>
<li>
<p style="font-style: normal;">To avoid big and ugly fonts under
text console, put vga=791 in your /etc/lilo.conf and type 'lilo'
under root: </p>
</li>
</ul>
<p><br>
<i> image=/boot/vmlinuz</i><br>
<i>
label=linux</i><br>
<i>
vga=791</i><br>
<i> root=/dev/hda6</i><br>
<i>
append=" quiet"</i><br>
<i>
read-only</i>
</p>
<p><br>
</p>
<ul>
<li>
<p>To add some RAM : you have one slot bellow the keyboard. I tried
to add a 256 Mo memory to reach 384 Mo but in this case, system detects
only 256Mo, so put only a 128Mo memory.</p>
</li>
</ul>
<span style="color: rgb(204, 0, 0);">Update: A L8400K owner reports
that he uses a Kingston RAM and that it works perfectly. (now 384Mb ).</span><br>
<p><br>
</p>
<h1>X11 config </h1>
<p>Using Suse 7.2, I had a bad X11 configuration with sax: the
touchpad didn't work at install ( random jumps ). The mouse for the
touchpad must be PS/2 to solve problem. Now, I use my lapstop with
the touchpad and a USB cordless/optical/wheel logiteck mouse, both
running perfectly.</p>
<p>
<a href="https://florat.net/asus-l8400k-overview/XF86Config">Here's my XF86Config.</a>
</p>
<p><br>
</p>
<h1>Kernel compilation</h1>
<p>I recompiled the 2.4.4 kernel without problem. </p>
<p>
<a href="https://florat.net/asus-l8400k-overview/kernel-201001.conf">Here's my compilation config</a>.</p>
</body>
</html>