Tuesday 11 February 2020

How to familiarize yourself with a new IT project

When many newcomers enter a new company, the biggest headache is how to quickly understand the company's business and project structure. or don't ask for speed, give you enough time, and it's hard to sort out your thoughts in a huge business. of course, if you meet a particularly enthusiastic old employee, tell you everything in detail, and always be by your side to answer questions, that may be fine. but unfortunately, i didn't meet such a person, and after joining the new company, the person who took me almost didn't take the time to talk to me about the project, nor did they arrange for me to be familiar with the needs of the project. in just over a month, i slowly began to familiarize myself with about ten projects on my own, and summarized some methods in the process, taking the opportunity to record them and share them with everyone.

it is important to emphasize here that my strategy is not to quickly understand the specific business of a project, and this different project is different and cannot be summarized. my strategy is to generally understand all the projects in the entire business line, roughly understand what each project is doing, and what the relationship between them is, so that no matter what projects are specifically responsible for in the future, they will not be unable to find a direction, and the details of the business, although it takes time, is much simpler than being confused about the whole.

i. necessary conditions

the first thing we have to think about is, what are the necessary conditions, as long as you are given enough time, you will always be able to fully understand the whole project? the necessary conditions mentioned here are not "who is the customer the project faces" and "what is the framework used for the project", but the real necessary conditions, just as the entire mathematical system can be derived with a few mathematical axioms. the real necessities i have summarized here are only these two points:

source location (gitlab or svn), deployment environment (dev/test/online)

the so-called project is actually a bunch of code on a bunch of machines, so this is enough. of course, in order to save more time, it is best to go to wikis, jenkins, page access paths, database addresses, the reason why i say those two necessary conditions is to say that in fact, the project is essentially such a simple thing, you must not think too complicated. its business can be infinitely complex, but its essence cannot escape these, and you must not be confused. when you don't know how to start or don't know anything, then you mainly figure out the source code and environment, and the rest are appendages.

second. the line from the page to the database

with the above necessary conditions in place, we started to understand the project. since it's not just a project, you must not delve into the specific code, otherwise you will get more and more annoyed until you give up, and it will not have a good effect. the understanding of a specific project must be based on an understanding of the whole. at this point, we first draw a line for each project and mark the information of each node, just like the following:

page access path - front-end project - background service - database address

a front-end project here may correspond to multiple back-end services, so the final diagram should look something like this:

this process of sorting out is mainly to sort out what projects there are, which are front-end visible, and which are back-end services. moreover, roughly understand which background services are called by the front-end project, through the name of the background service and database, we can essentially understand what functions this business line provides, from the front-end project and the page path, we can understand what we need to show the user. note that at this stage we only know the name, even if you click on the page, connect to the database to see, do not spend too much time, the focus of this stage is just to know, the overall content of this business line.

on this basis, this diagram can be constantly refined, such as the machine deployed by the project, which we can mark next to the project or save in the xshell. in addition, all non-business-related, can be found as far as possible to record, this is really for the future to find a variety of things convenient too much, otherwise do not look at you now save time, the time to find related things in the future, will be astronomical.

here is a small episode about the machine deployed by the project, and i will share it with you. since no one will tell you one by one about this part of the information, even if there is, it is impossible to say it in particular. so i sorted it out with the help of jenkins. project deployment needs to use jenkins, as long as you look at the commands configured by jenkins, you can sort out the deployment environment one by one, which i think is the most complete and up-to-date. 


don't tell me to check the wiki, if the company wiki is so well written, i guess there will be no such article. at that time, my jenkins permissions were particularly small, i could only look at a part of the project, and i could only execute, i could not look at the configuration, and the person who took me was also a door cut, and every time he asked him, he gave me the execution permission to open the required project, and did not give it at all. later i didn't bother to ask, because the jenkins machine everyone can log in with root permissions, so i entered the jenkins configuration file config .xml, added an admin permission for myself, restarted jenkins, and then opened the screen full of projects are out, and can be viewed and modified, unimpeded. in this way, i sorted out the deployed machines through the configuration of jenkins one by one, and also looked at the startup logic.

3. understand the relationship between projects

if there are old employees who are willing to talk to you about this part, it is best to understand it. if not, it doesn't matter, skip this paragraph first, and then slowly understand it later.

collation of database tables

We have collated the general framework of the project above, and have not yet touched on the specific project details. This part is still not covered. If you look at the essence of the whole business, the business is nothing more than a bunch of code running on a bunch of machines. Then from the perspective of a single project, a project is nothing more than an addition, deletion, modification and inspection operation of the database, or from the user's point of view, a project is to enter some parameters to get some return results. So next we have to do two things, one is to organize the database tables, and the other is to organize all the interfaces of the Controller layer.

here we must first choose a core project to see, there must be one of the many projects is a core project, start from this point of view.

if the database has fewer tables, then we take the tool to export the table structure, one by one to see on the line, this is not difficult. however, if there are many database tables, we must first export all the table names and filter out the core tables. here to export the table name, filter table and the later analysis table field, may wish to make a tool for yourself, i encounter some very troublesome or feel that the future can be general things, will make a small tool, put in a program i named myself javamate, these small tools gradually accumulate you will find unexpected convenience in the future. then again, how to judge which are the core tables, don't worry, let's first rule out some useless. take the system i analyzed in the company, there are more than 150 tables, many of which are copy at the end of the backup, flow at the end of the flow is the flow, rel is the middle correlation table, statistics at the end of the statistics is the statistics table, log at the end of the log table, config ends with the configuration table 。 wait a minute. after excluding these tables that have no impact on the understanding of the core business, there are only 20 or so tables left, and according to their names, it can be seen that many tables belong to a category, such as the order table has a variety of orderers, and it is not difficult to divide them into four or five categories according to the category, and it is not difficult to analyze again. of course, if it is a larger architecture, then it is necessary to continue to do disassembly.

before analyzing these core table fields in detail, one more thing to do is to find the relationship between the tables. if there is a field in table b called a.id, then b and a are one-to-many relationships, and if the two tables have a rel intermediate table, then the two are many-to-many relationships, at least logically. i also made a small tool for this analysis process, judging through the program.

at this point, you will have an idea of the overall database structure. according to the name of the table, you can also understand the general content of the table, and then it is for the specific table, look at the specific fields inside and the notes given by the predecessors, this process has no skills, to be patient, to slowly boil.

5. drill down into the code layer

when you've done the above about database tables, you basically know almost what services the system can provide. no matter what your code looks like, the database is there, in fact, the service that can be provided is almost out, and for experienced people, the business logic of the code can roughly guess eight or nine points.

i think a business-related project code is divided into only three parts:

1. interactively add, delete, modify, and check your own database
2. add, delete, modify, or check your database through scheduled tasks or server scripts
3. call or notify other services to do something

if it is just a single project, it is nothing more than playing with its own database through various channels, and the first two points are enough. if it is a micro services deployment, then adding a third point is enough. we divide the code logic into these three parts, quickly understand a project is not a problem, even if you have not seen a project and suddenly have a bug to solve, you can also follow this way to quickly ask the question.

Interactively add, delete, modify, and check your own database: this is nothing more than the simplest part, even if it is complex, it is a long code and a lot of tables. The so-called interaction may be the interface exposed by the Controller to the front-end user, or the interface exposed to other microservices by opening an rpc port, in short, it is triggered by a third party. Here I also made a gadget for myself, scanning out all the interfaces of the exposed service, showing the method name, path name, parameter list and return value. 


Like the database, if the interface is rare, then look at one by one, if there are many, or first find out the core of the several methods to study. Here I'm using postman, saving the interface access to be studied and adding the Examples that succeeded and failed to access. 


Here I recommend that you also use postman when you develop it, the more detailed the better, postman can not only simply access your interface, but also do batch testing, you can also generate API documents for interaction with the front-end. In this way, you not only tested your own interface, but also saved the time writing documentation. And postman also has the advantage that you can give your interface mock a service, so that even if your interface is hung, or your interface is not written at all, you can let the front-end personnel access your mock first, which does not affect the front-end test while developing, which is the real front-end and back-end separation. 


After sorting out all the interfaces, most of them must be very simple, one look at it, one layer after another to click into the SQL statement at the database level, the most essential thing of the interface will come out. If it's complicated, then step by step debug, taking the time to analyze it. If it's complicated, you can draw a flowchart (here I recommend using processon). Even with several interfaces around a function, you can draw a state flow diagram. For example, I saw our company deal with the order business before, the logic is indeed more complicated, I drew a state flow diagram similar to the following with a programmer's perspective (here is just an example diagram):

state flow graph: the horizontal axis represents the state of the order_status field, and the vertical axis represents what changes the field when the order_status is the above state)

interface-to-table impact diagram: here you can list all the tables involved and the key fields in the table, and then look at the impact of the interface on each table field after calling the interface separately, and the changes are marked in red

with these two dimensions of perspective, i believe that even complex businesses can be clearly sorted out, and the most essential problems of some bugs can be found. it was in this way that i understood a project that did not belong to me in a short period of time, and quickly and accurately fixed many stubborn bugs. although the project is very bad and the business logic is very confusing, it is this period of time that has exercised my ability to go deep into the code to clarify the logic, and i have my own unique set of methods.

add, delete, or modify your own database through scheduled tasks or server scripts: this is the same as the first type, but with a different entry. if there are some problems that you find that are not caused by interaction, then you have to look for other entrances. for example, scheduled tasks, or some threads that are opened when they are started. it is true that finding these entrances is not particularly easy, it is a headache, but it is only that the entrances are relatively hidden. find him, write it down, and the specific analysis process is still analyzed according to the above methods.

calling or informing other services to do something: if you've done almost the same amount of code, and you've basically figured out how the whole project plays itself, there's a small part left of it being interacted with other services. there must be messages to other services through mq, or directly call the interface of other services, or call an interface similar to cloud push to let it help send messages like mq. in short, regardless of the form, it is just for other rpc services. this part of the code may be more stealthy, but the amount is small, the logic is simple, and all you need to do is find them. this section also sets the stage for understanding the relationships between projects.

after these three types of code have been studied clearly, it is basically enough for a business-type project. for some basic services and middle ware types of services, it is still necessary to slowly accumulate technical depth, and the understanding process can still be regular, but it needs to be classified in a more low-level way, such as dividing the code into resource loading, pattern matching, and so on. since this article is a quick look at a business-oriented project, i will not go into the narrative.

vi. re-clarify the relationship between projects

well, at this time you have a general understanding of each project, at least the effect of the call, the service that the database can provide, and even the essential logic of some key parts, you are clear. at this time, it is necessary to reorganize the relationship between the projects.

based on the previous interface name, learn more about the invocation relationship between the next projects. the part that is not clear to ask the old employees, at this time you ask with your own understanding, they can also give more information.

look at the middleware used in each project, mainly mq services, to see who is the producer and who is the consumer, to understand the relationship

at this point, you should have already held several rounds of weekly meetings, and you should be able to understand some of the contents of the next weekly meetings. according to everyone's description and the latest set of requirements, gradually figure out the problems facing the project now, and which project is the core, which project is auxiliary, and which project is based on stability and security

at this point, you have a general understanding of the entire business line, and then you must combine the content you are specifically responsible for, and the leader arranges the direction you do to see the specific business code. dive into it and learn everything in detail. but at this time, through the efforts in front of you, you can already stand at a certain height to see each project, although you still do not understand the details, but this is completely different. while studying the specific business code, we constantly jump out to look at the framework of the entire business line, and correct the previous architecture that was misunderstood because it did not understand the specific business. in the long run, you will definitely stand out in a project, let everyone realize your global vision, which is also a way to get out of the strange circle of writing additions, deletions, and code checks. slowly some people will realize that your understanding of the project can always stand in the overall vision, a lot of business that needs to be done across projects, will naturally think of you, and slowly, you will be exposed to more core things, become an architect, or turn to products, turn to management.

this is the process of understanding the project that i summarized, and i hope that the big guys will leave more messages to point out, ask questions, and make progress together.

No comments:

Post a Comment